python python-3.x cpython python-internals

Causes for inconsistent behavior when adding NaNs to a set

There is puzzling (at least for me) behavior of Python's set in combination with NaNs (here live):

>>> float('nan') in {float('nan')}    # example 1
False
>>> nan = float('nan')                # example 2
>>> nan in {nan}
True

At first, I wrongly assumed,that this is the behavior of the ==-operator, but this is obviously not the case because both cases yield False as expected (here live):

>>> float('nan') == float('nan') 
False
>>> nan = float('nan')
>>> nan == nan
False

I'm mainly interested in the causes for this behavior. But if there is a way to ensure consistent behavior, that would also be nice to know!

Solution

set membership does an identity check as a short-circuit before considering an equality check (CPython source is in setobject.c, see also the note below PyObject_RichCompareBool).

Python core devs are motivated by these invariants:

for a in container:
    assert a in container    # this should ALWAYS be true

Ergo:

assert a in [a]
assert a in (a,)
assert a in {a}

It was decided that ensuring these invariants was the most important priority, and as for NaN: oh well. Special cases aren't special enough to break the rules. For all the gory details, see bpo issue4296:

Python assumes identity implies equivalence; contradicts NaN.