Why does the set
function call wipe out the dupes, but parsing a set literal does not?
>>> x = Decimal('0')
>>> y = complex(0,0)
>>> set([0, x, y])
{0}
>>> {0, x, y}
{Decimal('0'), 0j}
(Python 2.7.12. Possibly same root cause as for this similar question)
Sets test for equality, and until there are new Python releases, the order in which they do this can differ based on the form you hand the values to the set being constructed, as I'll show below.
Since 0 == x
is true and 0 == y
is true, but x == y
is false, the behaviour here is really undefined, as the set assumes that x == y
must be true if the first two tests were true too.
If you reverse the list passed to set()
, then you get the same output as using a literal, because the order of equality tests changes:
>>> set([y, x, 0])
set([0j, Decimal('0')])
and the same for reversing the literal:
>>> {y, x, 0}
set([0])
What's happening is that the set literal loads the values onto the stack and then the stack values are added to the new set object in reverse order.
As long as 0
is loaded first, the other two objects are then tested against 0
already in the set. The moment one of the other two objects is loaded first, the equality test fails and you get two objects added:
>>> {y, 0, x}
set([Decimal('0'), 0j])
>>> {x, 0, y}
set([0j, Decimal('0')])
That set literals add elements in reverse is a bug present in all versions of Python that support the syntax, all the way until Python 2.7.12 and 3.5.2. It was recently fixed, see issue 26020 (part of 2.7.13, 3.5.3 and 3.6, none of which have been released yet). If you look at 2.7.12, you can see that BUILD_SET
in ceval.c
reads the stack from the top down:
# oparg is the number of elements to take from the stack to add
for (; --oparg >= 0;) {
w = POP();
if (err == 0)
err = PySet_Add(x, w);
Py_DECREF(w);
}
while the bytecode adds elements to the stack in reverse order (pushing 0
on the stack first):
>>> from dis import dis
>>> dis(compile('{0, x, y}', '', 'eval'))
2 0 LOAD_CONST 1 (0)
3 LOAD_GLOBAL 0 (x)
6 LOAD_GLOBAL 1 (y)
9 BUILD_SET 3
12 RETURN_VALUE
The fix is to read the elements from the stack in reverse order; the Python 2.7.13 version uses PEEK()
instead of POP()
(and a STACKADJ()
to remove the elements from the stack afterwards):
for (i = oparg; i > 0; i--) {
w = PEEK(i);
if (err == 0)
err = PySet_Add(x, w);
Py_DECREF(w);
}
STACKADJ(-oparg);
The equality testing issue has the same root cause as the other question; the Decimal()
class is having some equality issues with complex
here, which was fixed in Python 3.2 (by making Decimal()
support comparisons to complex
and a few other numeric types it didn't support before).