Search code examples
indexingstring-matchingnearest-neighborlocality-sensitive-hash

Use Locality Sensitive Hashing on dynamic data set


I am using LSH for database records and by that I am creating a index (not a database index, a simple hashmap) where similar records blocked in to the same bucket. The database may contain several millions of records. My question regards with the design I post below.

enter image description here

First I will create the index using the database available by executing LSH. But when a new record inserted in to the database I must index that record also in to the index. How can I do this using LSH? Can LSH allocate that record to the bucket that have similar records?? Does LSH support updates in to the dataset?


Solution

  • I would use E2LSH (which is developed by Andoni, which is a great guy), which is written in C++. In the site of the project, it is mentioned:

    Newest (not quite) LSH algorithms (2014): These algorithms achieve performance better than the classic LSH algorithms by using data-dependent hashing. They improve over classic LSH algorithms for both Hamming and Euclidean space. These algorithms are not dynamic however, in contrast to the classic LSH algorithms, which use data-independent hashing and hence allow updates to the pointset.

    If you do not want to use a library, but for some reason you want to develop your own, I would suggest you to study the manual first.