Search code examples
pythonlistoptimizationtuplespython-internals

Tuple or list when using 'in' in an 'if' clause?


Which approach is better? Using a tuple, like:

if number in (1, 2):

or a list, like:

if number in [1, 2]:

Which one is recommended for such uses and why (both logical and performance wise)?


Solution

  • The CPython interpreter replaces the second form with the first.

    That's because loading the tuple from a constant is one operation, but the list would be 3 operations; load the two integer contents and build a new list object.

    Because you are using a list literal that isn't otherwise reachable, it is substituted for a tuple:

    >>> import dis
    >>> dis.dis(compile('number in [1, 2]', '<stdin>', 'eval'))
      1           0 LOAD_NAME                0 (number)
                  3 LOAD_CONST               2 ((1, 2))
                  6 COMPARE_OP               6 (in)
                  9 RETURN_VALUE        
    

    Here the second bytecode loads a (1, 2) tuple as a constant, in one step. Compare this to creating a list object not used in a membership test:

    >>> dis.dis(compile('[1, 2]', '<stdin>', 'eval'))
      1           0 LOAD_CONST               0 (1)
                  3 LOAD_CONST               1 (2)
                  6 BUILD_LIST               2
                  9 RETURN_VALUE        
    

    Here N+1 steps are required for a list object of length N.

    This substitution is a CPython-specific peephole optimisation; see the Python/peephole.c source. For other Python implementations then, you want to stick with immutable objects instead.

    That said, the best option when using Python 3.2 and up, is to use a set literal:

    if number in {1, 2}:
    

    as the peephole optimiser will replace that with a frozenset() object and membership tests against sets are O(1) constant operations:

    >>> dis.dis(compile('number in {1, 2}', '<stdin>', 'eval'))
      1           0 LOAD_NAME                0 (number)
                  3 LOAD_CONST               2 (frozenset({1, 2}))
                  6 COMPARE_OP               6 (in)
                  9 RETURN_VALUE
    

    This optimization was added in Python 3.2 but wasn't backported to Python 2.

    As such, the Python 2 optimiser doesn't recognize this option and the cost of building either a set or frozenset from the contents is almost guaranteed to be more costly than using a tuple for the test.

    Set membership tests are O(1) and fast; testing against a tuple is O(n) worst case. Although testing against a set has to calculate the hash (higher constant cost, cached for immutable types), the cost for testing against a tuple other than the first element is always going to be higher. So on average, sets are easily faster:

    >>> import timeit
    >>> timeit.timeit('1 in (1, 3, 5)', number=10**7)  # best-case for tuples
    0.21154764899984002
    >>> timeit.timeit('8 in (1, 3, 5)', number=10**7)  # worst-case for tuples
    0.5670104179880582
    >>> timeit.timeit('1 in {1, 3, 5}', number=10**7)  # average-case for sets
    0.2663505630043801
    >>> timeit.timeit('8 in {1, 3, 5}', number=10**7)  # worst-case for sets
    0.25939063701662235