I assumed that when we iterate through a set, the order in which the iteration occurs is based on the increasing order of the hashes of the hash table that corresponds to the set. I wanted to test this out and wrote a code for the same.
I read somewhere that the hash table doubles in size in Python when it is filled over 60%. Initially, it has eight slots. So, if I fill up six of those slots, it should have 16 slots due to resizing. The hashes stored in them should be hash(x)%16
, then.
>>> t = tuple(hash(i)%16 for i in (chr(i) for i in range(97, 103)))
This creates a tuple t
with modulo 16 of the hashes of characters 'a' through 'f'. I repeated this till I got no collisions and got t = (4, 3, 1, 7, 12, 9)
. I expected that when I print set('abcdef')
, it would display {'c','b','a','d','f','e'}
, but the output was {'c','d','f','b','a','e'}
.
However, if I instead did
>>> t = tuple(hash(i)%32 for i in (chr(i) for i in range(97, 103)))
I got t = (20, 19, 1, 7, 28, 9)
, which predicts the order {'c','d','f','b','a','e'}
correctly.
Every time I did it for six elements and '%32', the right order was predicted.
So, what's going on here? Is the hash table's size 32? Why is that?
You're looking at implementation details, and then summarizing them. You left out one critical detail: for small sets, the hash table size is multiplied by 4. What is the internal load factor of a sets in Python
Note that this is all subject to change. As the linked question notes, the load factor has changed from 2/3 to 3/5 in 2017. Reasonable code does not depend on this.