Search code examples
pythonlisthashnearest-neighborlocality-sensitive-hash

How to hash lists?


Lists are not hashable. However, I am implementing LSH and I am seeking for a hash function that will correspond a list of positive integers (in [1, 29.000]) to k buckets. The number of lists is D, where D > k (I think) and D = 40.000, where k is not yet known (open to suggestions).


Example (D = 4, k = 2):

118 | 27 | 1002 | 225
128 | 85 | 2000 | 8700
512 | 88 | 2500 | 10000
600 | 97 | 6500 | 24000
800 | 99 | 7024 | 25874

The first column should be given as input to the hash function and return the number of a bucket.


What confuses me is that we do not seek for a function to hash a number, but a column, i.e. a list of positive integers.

Any ideas please?

I am using if that matters


Solution

  • You can just convert it in a hashable type before:

    In [4]: hash(l)
    TypeError: unhashable type: 'list'
    
    hash(tuple(l)) % k  # 29000
    Out[5]: 70846