Search code examples
cluster-analysislocality-sensitive-hashminhash

How can I get the similarity matrix from minhash LSH?


I have read many tutorials and tried a number of minhash LSH, but it cannot generate the similarity matrix, instead it returns just similar data which exceeds the threshold. How can I generate it? My intention is to use the LSH results for clustering.


Solution

  • The whole point of LSH is to avoid pairwise distances, because that does not scale.

    If you then put the data into a distance matrix, you get all the scalability problems again!

    Instead consider an algorithm like DBSCAN clustering. It doesn't need a distance matrix, only neighbors at distance epsilon.