Search code examples
pythonpython-3.xcpythonpython-internals

Causes for inconsistent behavior when adding NaNs to a set


There is puzzling (at least for me) behavior of Python's set in combination with NaNs (here live):

>>> float('nan') in {float('nan')}    # example 1
False
>>> nan = float('nan')                # example 2
>>> nan in {nan}
True

At first, I wrongly assumed,that this is the behavior of the ==-operator, but this is obviously not the case because both cases yield False as expected (here live):

>>> float('nan') == float('nan') 
False
>>> nan = float('nan')
>>> nan == nan
False

I'm mainly interested in the causes for this behavior. But if there is a way to ensure consistent behavior, that would also be nice to know!


Solution

  • set membership does an identity check as a short-circuit before considering an equality check (CPython source is in setobject.c, see also the note below PyObject_RichCompareBool).

    Python core devs are motivated by these invariants:

    for a in container:
        assert a in container    # this should ALWAYS be true
    

    Ergo:

    assert a in [a]
    assert a in (a,)
    assert a in {a}
    

    It was decided that ensuring these invariants was the most important priority, and as for NaN: oh well. Special cases aren't special enough to break the rules. For all the gory details, see bpo issue4296:

    Python assumes identity implies equivalence; contradicts NaN.