Why aren't set literals O(1) for "in" operations?

It's pretty common to test a lot of constants against a variable by doing

if x in ('foo', 'bar', 'baz'):

rather than

if x == 'foo' or x == 'bar' or x == 'baz':

I've seen a lot of "use {'foo', 'bar', 'baz'} instead of ('foo', 'bar', 'baz') for O(1) performance," which makes sense, but testing shows pretty strange results.

%timeit 1 in {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
27.6 ns ± 2.35 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit 10 in {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
136 ns ± 4.04 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit 0 in {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
186 ns ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Why is the lookup for set literals not constant time?

Solution

Set lookup is on average an O(1) operation. It should not consistently change performance with what element of the set you check for, except randomly to a certain extent as some values may have hash collisions with other values and so take longer to be found. The time differences you're seeing to look up different values in your small sets are almost certainly coincidences, or noise that you're mistaking for data.

Note that you're not just timing set membership in your tests. You're also creating a new set each time, and that is usually an O(N) operation (where N is the number of values in the set). In some special situations, a set literal may be created in O(1) time as the Python compiler does an optimization to replace the mutable set object with an immutable frozenset object, which it has computed in advance as a constant. That only happens in situation where the compiler expects to be recreating the object a whole bunch of times, and where it can tell that no reference to the set object can leak out of the area of code its running. For instance, a set used in a if clause of a comprehension or generator expression may get the constant treatment:

[foo(x) for x in some_iterable if x in {0, 1, 2, 3, 4, 5, 6, 7, 9}]

In recent versions of CPython, the set literal here will always refer to a constant frozenset that doesn't need to be recreated for each x value yielded from some_iterable. But you probably shouldn't rely upon this behavior, as other Python interpreters, and even other versions of CPython may not perform the same optimization.

This can't explain what you're seeing in your timings though. I suspect that there's some artifact in your environment that explains the issue, or it may just be random chance that the smallest value in the set happens to not have any hash collisions while the last one (by coincidence) has several. If you test other values in the set, you'll probably get a small range of different timings. But that range won't tend to vary much with the number of set elements, it should be fairly similar for every size of set (there may be small differences, but much less than a factor of N).

Try a more specific test (with set creation factored out), like this:

import timeit, random

big_set = set(range(1000000))

for x in random.sample(range(1000000), 10):
    print('looking up', x, 'took', timeit.timeit(lambda: x in big_set), 'seconds')