Sets are collections of unordered values. If I construct a set via a set literal, e.g.
s = {'a', 'b', 'c'}
and then print it, I get the elements in some scrambled order. However, it seems that in Python 2.7, the above example always results in the same ordering:
print(s) # set(['a', 'c', 'b']) in Python 2.7
How does Python 2.7 decide on this ordering? Even the hashes of 'a'
, 'b'
and 'c'
are not in the order produced.
In Python 3.x (including 3.6 where dict
keys are ordered) the resulting order seems to be random, though always the same within a given Python process. That is, repeatedly re-building the set literal always lead to the same ordering, as long as I do not restart the Python interpreter.
To check the ordering across multiple Python processes, consider the bash code
(for _ in {1..50}; do python3 -c "s = {'a', 'b', 'c'}; print(s)"; done) | sort -u
This will (most often) show the 6 different ways the 3 elements can be arranged. Switching out python3
with python
(2), we only see the ordering ['a', 'c', 'b']
. What determines the ordering in Python 3?
I see that the hash
value of objects are deterministic in Python 2 while random (though constant within a Python process) in Python 3. I'm sure this is key to the full explanation.
As deceze writes in his comment, I would like to know if Python explicitly does something just to achieve this randomization, or if it happens "for free".
The reason for the difference in Python 3 (from Python 3.3 onwards) is that hash randomization is enabled by default, you could turn this off by setting the PYTHONHASHSEED
environmental variable to a fixed value:
$ export PYTHONHASHSEED=0
$ (for _ in {1..50}; do python3 -c "s = {'a', 'b', 'c'}; print(s)"; done) | sort -u
{'a', 'b', 'c'}
Equally you can turn hash randomization on in Python 2 with the -R
flag:
$ (for _ in {1..50}; do python2 -R -c "s = {'a', 'b', 'c'}; print(s)"; done) | sort -u
set(['a', 'b', 'c'])
set(['a', 'c', 'b'])
set(['b', 'c', 'a'])
set(['c', 'b', 'a'])
Note, you don't generally want to turn it off since having hash randomization enabled helps protect against certain denial-of-service attacks.