Gaussian Kernel-Based LSH for High-Dimensional Similarity Search

Research output: Contribution to journalArticlepeer-review

Abstract

High-dimensional similarity search remains a critical challenge in machine learning, particularly when data lie on complex, non-linear manifolds that undermine the effectiveness of classical Locality-Sensitive Hashing (LSH). This work introduces Gaussian LSH, a kernel-based hashing framework that integrates over-clustering with Gaussian probability density modelling to improve locality preservation while maintaining computational efficiency. The method generates compact binary codes from a hybrid kernel–PDF score and supports scalable GPU-accelerated indexing for large datasets. Empirical evaluations across multiple visual and textual benchmarks demonstrate consistent improvements in recall and query latency compared to representative LSH variants and approximate nearest neighbour libraries. Gaussian LSH achieves recall gains of up to 9 pp and latency reductions of up to 4.3×, with benefits sustained across a range of code lengths. These results highlight the approach’s scalability and accuracy, supporting its use in medium- to large-scale similarity retrieval tasks across diverse data domains.

Original languageEnglish
Pages (from-to)1402-1413
Number of pages12
JournalIEEE Open Journal of the Computer Society
Volume6
DOIs
Publication statusPublished - 25 Aug 2025

Keywords

  • Accuracy
  • Approximate nearest-neighbour search
  • Costs
  • High-dimensional data
  • Indexing
  • Kernel
  • Kernel-based hashing
  • Locality-sensitive hashing
  • Manifolds
  • Memory management
  • Scalability
  • Sensitivity
  • Tuning
  • Vectors

Fingerprint

Dive into the research topics of 'Gaussian Kernel-Based LSH for High-Dimensional Similarity Search'. Together they form a unique fingerprint.

Cite this