Search code examples
pythonhashcross-platform

Positive integer from Python hash() function


I want to use the Python hash() function to get integer hashes from objects. But built-in hash() can give negative values, and I want only positive. And I want it to work sensibly on both 32-bit and 64-bit platforms.

I.e. on 32-bit Python, hash() can return an integer in the range -2**31 to 2**31 - 1. On 64-bit systems, hash() can return an integer in the range -2**63 to 2**63 - 1.

But I want a hash in the range 0 to 2**32-1 on 32-bit systems, and 0 to 2**64-1 on 64-bit systems.

What is the best way to convert the hash value to its equivalent positive value within the range of the 32- or 64-bit target platform?

(Context: I'm trying to make a new random.Random style class. According to the random.Random.seed() docs, the seed "optional argument x can be any hashable object." So I'd like to duplicate that functionality, except that my seed algorithm can't handle negative integer values, only positive.)


Solution

  • To support platforms with signed and unsigned hash(), you could use

    hash(x) % 2**sys.hash_info.width
    

    This will use the actual hash width as reported by Python rather than a guess from what Python deems to be the maximum size of a list on the platform. The % operation will map negative numbers to positive numbers, e.g. -1 will become 2**sys.hash_info.width - 1.

    Note that if x is a positive integer close to 0, hash(x) is the identity function, i.e. it just passes through the value. In general, on 64-bit with Python 3.6, it seems to compute

    
    (abs(x) % m) * (-1 if x<0 else 1)
    
    

    with m=2**61-1, the 9th Mersenne prime. This can be problematic in some applications, e.g. filling a hash table with 1000 entries with values x = int(time_YYYMMDDhhmm) would result into a higher density of items at indices 0 to 359 than for 400 to 959 (and zero density in any range k*100+60 to k*100 + 99).