Comparison with Google ScaNN #

ScaNN #

ScaNN (Scalable Nearest Neighbors) is Google’s state-of-the-art vector similarity search library that combines quantization, dimensionality reduction, and efficient search algorithms to enable fast and accurate approximate nearest neighbor search. Below we compare PatANN and ScaNN across various datasets and metrics.

Query Time vs. Recall@10 Comparison #

Query Time vs Recall Graph

Detailed Comparison (SIFT1M, k=10) #

Performance Metrics #

Library	Geometric Mean QPS	Median QPS	QPS@95%	Weighted Avg QPS	AUC Value	AUC Normalized
PatANN	182,526.33	190,849.94	357,320.88	223,090.29	83,919.87	223,090.29
ScaNN	40,818.24	45,086.66	81,244.97	57,529.09	29,110.29	57,529.09

Recall at Different QPS Levels #

Library	Recall@10,000 QPS	Recall@50,000 QPS	Recall@100,000 QPS	Recall@200,000 QPS
PatANN	0.99991	0.99991	0.99991	0.99944
ScaNN	0.99691	0.93428	-	-

Algorithm Parameters #

Library	Points	Min K	Max K	K Range	Min Epsilon	Max Epsilon	Median Epsilon
PatANN	280	0.62374	0.99991	0.37617	0.72420	1	0.99718
ScaNN	33	0.49090	0.99691	0.50601	0.56457	0.99977	0.97712

Key Findings #

PatANN outperforms ScaNN by a significant margin, with 4.5x higher geometric mean QPS and 3.9x higher weighted average QPS. While ScaNN maintains excellent recall (99.7%) at 10,000 QPS and good recall (93.4%) at 50,000 QPS, it cannot maintain performance at higher QPS levels beyond 50,000, while PatANN continues to deliver consistent recall rates (>99.9%) even at 200,000 QPS.

ScaNN uses anisotropic quantization and a combination of partitioning and quantization to achieve its performance, making it more effective than traditional methods. However, PatANN’s pattern-aware approach demonstrates superior scalability and consistently higher recall rates across all tested throughput levels. ScaNN shows promising results with only 33 data points compared to PatANN’s 280, suggesting it may be more parameter-efficient but less capable of achieving the highest performance levels.