I want to use the Python hash()
function to get integer hashes from objects. But built-in hash()
can give negative values, and I want only positive. And I want it to work sensibly on both 32-bit and 64-bit platforms.
I.e. on 32-bit Python, hash()
can return an integer in the range -2**31
to 2**31 - 1
.
On 64-bit systems, hash()
can return an integer in the range -2**63
to 2**63 - 1
.
But I want a hash in the range 0
to 2**32-1
on 32-bit systems, and 0
to 2**64-1
on 64-bit systems.
What is the best way to convert the hash value to its equivalent positive value within the range of the 32- or 64-bit target platform?
(Context: I'm trying to make a new random.Random
style class. According to the random.Random.seed()
docs, the seed "optional argument x can be any hashable object." So I'd like to duplicate that functionality, except that my seed algorithm can't handle negative integer values, only positive.)
To support platforms with signed and unsigned hash()
, you could use
hash(x) % 2**sys.hash_info.width
This will use the actual hash width as reported by Python rather than a guess from what Python deems to be the maximum size of a list on the platform. The %
operation will map negative numbers to positive numbers, e.g. -1
will become 2**sys.hash_info.width - 1
.
Note that if x
is a positive integer close to 0, hash(x)
is the identity function, i.e. it just passes through the value. In general, on 64-bit with Python 3.6, it seems to compute
(abs(x) % m) * (-1 if x<0 else 1)
with m=2**61-1
, the 9th Mersenne prime. This can be problematic in some applications, e.g. filling a hash table with 1000 entries with values x = int(time_YYYMMDDhhmm)
would result into a higher density of items at indices 0 to 359 than for 400 to 959 (and zero density in any range k*100+60
to k*100 + 99
).