Search code examples
pythonhashsethashtable

Order of a set in Python


I assumed that when we iterate through a set, the order in which the iteration occurs is based on the increasing order of the hashes of the hash table that corresponds to the set. I wanted to test this out and wrote a code for the same.

I read somewhere that the hash table doubles in size in Python when it is filled over 60%. Initially, it has eight slots. So, if I fill up six of those slots, it should have 16 slots due to resizing. The hashes stored in them should be hash(x)%16, then.

>>> t = tuple(hash(i)%16 for i in (chr(i) for i in range(97, 103)))

This creates a tuple t with modulo 16 of the hashes of characters 'a' through 'f'. I repeated this till I got no collisions and got t = (4, 3, 1, 7, 12, 9). I expected that when I print set('abcdef'), it would display {'c','b','a','d','f','e'}, but the output was {'c','d','f','b','a','e'}.

However, if I instead did

>>> t = tuple(hash(i)%32 for i in (chr(i) for i in range(97, 103)))

I got t = (20, 19, 1, 7, 28, 9), which predicts the order {'c','d','f','b','a','e'} correctly.

Every time I did it for six elements and '%32', the right order was predicted.

So, what's going on here? Is the hash table's size 32? Why is that?


Solution

  • You're looking at implementation details, and then summarizing them. You left out one critical detail: for small sets, the hash table size is multiplied by 4. What is the internal load factor of a sets in Python

    Note that this is all subject to change. As the linked question notes, the load factor has changed from 2/3 to 3/5 in 2017. Reasonable code does not depend on this.