Search code examples
locality-sensitive-hash

Locality sensitive hashing - what happens when a bucket is empty?


Assume I've constructed an LSH database according to some set of hashes, and I'm now beginning to query the database to find approximate nearest neighbors.

Are there any guidelines to what happens when you compute the hash for a query point, and the corresponding bucket is empty? Similarly, say I want to find the 5 approximate nearest neighbors, and the bucket has only 4 other data points?


Solution

  • I believe getting too few points for a retrieval means that you have too many buckets for your training data. And that is application dependent of course. Take a look at LSH toolbox by Greg Shakhnarovich implementation and his README file. In this implementation, fewer hash functions (smaller k) means fuller buckets, and that in turn means slower LSH.