Search code examples
pythonsecurityhash

Why python uses predictable hashes for numbers?


The Python doc says:

By default, the __hash__() values of str and bytes objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python. This is intended to provide protection against a denial-of-service caused by carefully chosen inputs that exploit the worst case performance of a dict insertion, O(n2) complexity.

But the algorithm used for computing hash() of numbers is deterministic. (It only uses salt for str and bytes). Why an attacker can not use integers to run DoS by the same reason?


Solution

  • The Python bug tracker (Issue 13703) captures a lot of discussion that led to this choice.

    Particularly significant factors were:

    1. JSON and XML-RPC messages are always interpreted with strings, not ints, as dict keys. (See relevant messages from the thread for JSON and XML-RPC.) So in the most plausible security threat context (requests fired at a service on the web), int hashing was seen as unlikely to cause an issue. There wasn't 100% consensus on this: this message has a dissenting view.

    2. "Using a more elaborate hash algorithm would slow down uses of numbers as dictionary keys and also be difficult to implement for non-integer types such as float, longs and complex numbers." (quote from this message)

    Essentially, the view that won out was that the security benefits of the change won out over the performance impact for strings and bytes objects, but not for numeric types.