vs Google ScaNN

Comparison with Google ScaNN #

ScaNN #

ScaNN (Scalable Nearest Neighbors) is Google’s state-of-the-art vector similarity search library that combines quantization, dimensionality reduction, and efficient search algorithms to enable fast and accurate approximate nearest neighbor search. Below we compare PatANN and ScaNN across various datasets and metrics.

Query Time vs. Recall@10 Comparison #

Query Time vs Recall Graph

Detailed Comparison (SIFT1M, k=10) #

Performance Metrics #

LibraryGeometric Mean QPSMedian QPSQPS@95%Weighted Avg QPSAUC ValueAUC Normalized
PatANN182,526.33190,849.94357,320.88223,090.2983,919.87223,090.29
ScaNN40,818.2445,086.6681,244.9757,529.0929,110.2957,529.09

Recall at Different QPS Levels #

LibraryRecall@10,000 QPSRecall@50,000 QPSRecall@100,000 QPSRecall@200,000 QPS
PatANN0.999910.999910.999910.99944
ScaNN0.996910.93428--

Algorithm Parameters #

LibraryPointsMin KMax KK RangeMin EpsilonMax EpsilonMedian Epsilon
PatANN2800.623740.999910.376170.7242010.99718
ScaNN330.490900.996910.506010.564570.999770.97712

Key Findings #

PatANN outperforms ScaNN by a significant margin, with 4.5x higher geometric mean QPS and 3.9x higher weighted average QPS. While ScaNN maintains excellent recall (99.7%) at 10,000 QPS and good recall (93.4%) at 50,000 QPS, it cannot maintain performance at higher QPS levels beyond 50,000, while PatANN continues to deliver consistent recall rates (>99.9%) even at 200,000 QPS.

ScaNN uses anisotropic quantization and a combination of partitioning and quantization to achieve its performance, making it more effective than traditional methods. However, PatANN’s pattern-aware approach demonstrates superior scalability and consistently higher recall rates across all tested throughput levels. ScaNN shows promising results with only 33 data points compared to PatANN’s 280, suggesting it may be more parameter-efficient but less capable of achieving the highest performance levels.