Search code examples
pythonpython-3.xlistoptimization

Python Lists: Why is the Pythonic list comprehension 8-20% slower?


(Update: The issue has been found. See my answer. This turned out to be a performance issue in Python 3.11, and only affects small lists. For Python 3.9, 3.10 and 3.12, list comprehension is ALWAYS faster, regardless of list size.)

I've seen claims that "list comprehension is always faster because it avoids the lookup and calling of the append() method".

The difference is how appending is done:

  • List comprehensions use just two bytecodes, LOAD_FAST(k); LIST_APPEND.
  • Regular .append() calls use five bytecodes, one of which is a method lookup (hash table), LOAD_METHOD(append); LOAD_FAST(k); PRECALL; CALL; POP_TOP.

So obviously, list comprehension should be faster since it avoids costly function lookups/calls...

But no matter how I twist this, I see two things:

  • List comprehension is 8-20% slower than .append(). A friend verified this on their computer too.
  • The disassembly for list comprehension is way longer, which may explain why it's slower.

Does anyone know what's going on? Why is the Pythonic "list comprehension" so much slower?

I am running Python 3.11 on Linux. He is running Windows. So it's slower on both platforms.

My code is below.

from typing import Any


def filter_unset_loop(
    settings: dict[str, Any],
    new_settings: dict[str, Any],
) -> dict[str, Any]:
    remove_keys: list[str] = []
    for k, v in new_settings.items():
        if (v is None or k == "self" or k == "config"):
            remove_keys.append(k)

    for k in remove_keys:
        del new_settings[k]

    return new_settings


def filter_unset_listcomp(
    settings: dict[str, Any],
    new_settings: dict[str, Any],
) -> dict[str, Any]:
    remove_keys: list[str] = [
        k for k, v in new_settings.items() if (v is None or k == "self" or k == "config")
    ]

    for k in remove_keys:
        del new_settings[k]

    return new_settings


def test_loop():
    filter_unset_loop(
        {},
        {
            "self": True,
            "what": None,
            "hello": "world",
            "great": "key",
            "1": 1,
            "2": 1,
            "3": 1,
            "4": 1,
            "5": 1,
            "6": 1,
            "7": 1,
            "8": 1,
            "9": 1,
            "10": 1,
            "11": 1,
            "12": 1,
            "13": 1,
            "14": 1,
            "15": 1,
            "16": 1,
        },
    )



def test_listcomp():
    filter_unset_listcomp(
        {},
        {
            "self": True,
            "what": None,
            "hello": "world",
            "great": "key",
            "1": 1,
            "2": 1,
            "3": 1,
            "4": 1,
            "5": 1,
            "6": 1,
            "7": 1,
            "8": 1,
            "9": 1,
            "10": 1,
            "11": 1,
            "12": 1,
            "13": 1,
            "14": 1,
            "15": 1,
            "16": 1,
        },
    )


import timeit
n = 20_000_000
t1 = timeit.Timer(test_loop)
print("loop (seconds):", t1.timeit(number=n))
t2 = timeit.Timer(test_listcomp)
print("listcomp (seconds):", t2.timeit(number=n))

import dis

print("\nLOOP DISASSEMBLY:\n")

dis.dis(filter_unset_loop)

print("\n" * 10)
print("\nLISTCOMP DISASSEMBLY:\n")

dis.dis(filter_unset_listcomp)

Solution

  • TL;DR answer:

    • In Python 3.12, list comprehension is always faster for everything.
    • In Python 3.11, list comprehension is always slower for small lists but always faster for large lists.
    • In Python 3.10 and 3.9, list comprehension is always faster for everything.

    So the issue only happens in Python 3.11. They most likely changed some code related to initialization of list comprehensions in that version, which gave it an unfair overhead penalty. They've fixed it in 3.12, and it wasn't an issue in 3.9 and 3.10.

    Recommendation: Use list comprehensions. They are better. :)


    Longer answer and detailed tests:

    Alright, the answer is that it depends.

    It depends on how many .append() operations you will be doing, and it also depends on your Python version!

    For Python 3.12, 3.10 and 3.9, List Comprehension is UNIVERSALLY FASTER (see bottom/final edit of this answer for more info about that).

    For Python 3.11, List Comprehension is slower in some cases, as follows:

    • For short loops with low amounts of .append(), list comprehension's overhead means that it's slower, as seen in my original question.
    • But as the amount of loop iterations grows, and thereby the amount of .append() calls, the list comprehension begins to gain a performance lead.

    Here's the updated test where I generate a dictionary with over 10 000 keys, which thereby made the loop/.append() variant about 6.5% slower than list comprehension due to having so many elements (tons of append() calls), basically inversing the performance, when testing this in Python 3.11.

    List comprehension is preferable since it keeps the code cleaner and is always faster for lots of appends, and also makes your code more future-proof.

    Here's my modified test with 10 000 items:

    from typing import Any
    
    
    def filter_unset_loop(
        settings: dict[str, Any],
        new_settings: dict[str, Any],
    ) -> dict[str, Any]:
        remove_keys: list[str] = []
        for k, v in new_settings.items():
            if (v is None or k == "self" or k == "config"):
                remove_keys.append(k)
    
        for k in remove_keys:
            del new_settings[k]
    
        return new_settings
    
    
    def filter_unset_listcomp(
        settings: dict[str, Any],
        new_settings: dict[str, Any],
    ) -> dict[str, Any]:
        remove_keys: list[str] = [
            k for k, v in new_settings.items() if (v is None or k == "self" or k == "config")
        ]
    
        for k in remove_keys:
            del new_settings[k]
    
        return new_settings
    
    
    def make_dict(out: dict[str, Any]) -> dict[str, Any]:
        for k in range(10000):
            out[str(k)] = 1
        
        return out
    
    
    test_dict_template = make_dict({
        "self": True,
        "what": None,
        "hello": "world",
        "great": "key",
    })
    
    
    def test_loop():
        filter_unset_loop(
            {},
            test_dict_template.copy(),
        )
    
    
    
    def test_listcomp():
        filter_unset_listcomp(
            {},
            test_dict_template.copy(),
        )
    
    
    import timeit
    n = 100_000
    t1 = timeit.Timer(test_loop)
    print("loop (seconds):", t1.timeit(number=n))
    t2 = timeit.Timer(test_listcomp)
    print("listcomp (seconds):", t2.timeit(number=n))
    
    import dis
    
    print("\nLOOP DISASSEMBLY:\n")
    
    dis.dis(filter_unset_loop)
    
    print("\n" * 10)
    print("\nLISTCOMP DISASSEMBLY:\n")
    
    dis.dis(filter_unset_listcomp)
    

    Update:

    Since Juan wanted me to do a test without any del statement, and with a global dictionary (so that it isn't recreated for every test), I have now done that.

    When I do that, List Comprehension is 9% slower than for-append for me, in Python 3.11.

    It really seems to come down to the way that list comprehensions generate a lot more bytecode to "prepare the list comprehension", which is its own form of overhead.

    (Edit: This list comprehension performance issue only happens in Python 3.11, as further testing below revealed.)

    from typing import Any
    
    
    def filter_unset_loop(
        settings: dict[str, Any],
        new_settings: dict[str, Any],
    ) -> dict[str, Any]:
        remove_keys: list[str] = []
        for k, v in new_settings.items():
            if (v is None or k == "self" or k == "config"):
                remove_keys.append(k)
    
        return new_settings
    
    
    def filter_unset_listcomp(
        settings: dict[str, Any],
        new_settings: dict[str, Any],
    ) -> dict[str, Any]:
        remove_keys: list[str] = [
            k for k, v in new_settings.items() if (v is None or k == "self" or k == "config")
        ]
    
        return new_settings
    
    
    settings: dict[str, Any] = {}
    new_settings: dict[str, Any] = {
        "self": True,
        "what": None,
        "hello": "world",
        "great": "key",
        "1": 1,
        "2": 1,
        "3": 1,
        "4": 1,
        "5": 1,
        "6": 1,
        "7": 1,
        "8": 1,
        "9": 1,
        "10": 1,
        "11": 1,
        "12": 1,
        "13": 1,
        "14": 1,
        "15": 1,
        "16": 1,
    }
    
    
    def test_loop():
        filter_unset_loop(
            settings,
            new_settings,
        )
    
    
    
    def test_listcomp():
        filter_unset_listcomp(
            settings,
            new_settings,
        )
    
    
    import timeit
    n = 30_000_000
    t1 = timeit.Timer(test_loop)
    print("loop (seconds):", t1.timeit(number=n))
    t2 = timeit.Timer(test_listcomp)
    print("listcomp (seconds):", t2.timeit(number=n))
    

    Final Update: Reason found! Python 3.12 has optimized List Comprehensions!

    Since some commenters wanted to use timeit to check the differences, but they used incorrect timeit commands, here are the accurate comparisons which use local variables and appropriate amounts of iterations.

    Most importantly, I found out what's going on! Python 3.12 has optimized List Comprehensions. In Python 3.11, list comprehensions are 22% slower than .append() for small lists. In Python 3.12 they are always universally faster.

    $ pyenv local 3.11  
    
    $ python --version    
    Python 3.11.7
    
    $ python -m timeit -n 1000 "x=[]" "for i in range(100_000): x.append(i)"
    1000 loops, best of 5: 1.72 msec per loop
    
    $ python -m timeit -n 1000 "x=[i for i in range(100_000)]"  
    1000 loops, best of 5: 1.34 msec per loop
    
    $ python -m timeit -n 10000000 "x=[]" "for i in range(10): x.append(i)"
    10000000 loops, best of 5: 191 nsec per loop
    
    $ python -m timeit -n 10000000 "x=[i for i in range(10)]"
    10000000 loops, best of 5: 234 nsec per loop
    
    $ pyenv local 3.12                                       
    
    $ python --version
    Python 3.12.1
    
    $ python -m timeit -n 1000 "x=[]" "for i in range(100_000): x.append(i)"
    1000 loops, best of 5: 2.36 msec per loop
    
    $ python -m timeit -n 1000 "x=[i for i in range(100_000)]"
    1000 loops, best of 5: 1.69 msec per loop
    
    $ python -m timeit -n 10000000 "x=[]" "for i in range(10): x.append(i)"
    10000000 loops, best of 5: 204 nsec per loop
    
    $ python -m timeit -n 10000000 "x=[i for i in range(10)]"
    10000000 loops, best of 5: 160 nsec per loop
    

    Update with fixed version of @JonSG's code, to add another data point to this.

    JonSG's answer had some good ideas, but his numbers were completely invalid, since he ran so few iterations of each test that the numbers were completely useless. (His code ran the tests for just a few nanoseconds per test...)

    I rewrote the test to run an appropriate amount of tests per iteration, to account for variations in background tasks, random CPU cache stutters, CPU throttling, fluctuations in CPU boost frequency, etc.

    After fixing his tests, his tests confirm what I discovered above:

    • Python 3.12 is universally faster when using List Comprehensions.
    • Python 3.11 is faster with for-append() loops for small lists, but loops are slower when the lists are longer. The reason for this slowness for small lists is probably due to the fact that list comprehension requires creating a new stack frame, which in turn requires a lot of bytecode. So there's lots of overhead for "starting" a list comprehension in Python 3.11. This process was clearly not optimized well in older versions of Python. (Edit: This performance issue only affects Python 3.11, see further tests of older Python versions below!)
    • My recommendation/conclusion is still: Despite the bad results in Python 3.11, I will always use list comprehensions, because it's the most universally futureproof technique, and is always the best whenever the list is long. So it's not worth worrying about the small edge cases ("small lists, certain old Python versions"), in my opinion.
    import sys
    import timeit
    
    def test_loop(n):
        foo = []
        for i in range(n):
            foo.append(i)
        return foo
    
    def test_comprehension(n):
        foo = [i for i in range(n)]
        return foo
    
    list_size_steps = 7  # 10 to 10 million
    list_size = 10
    num_elements = 60_000_000
    
    print(f"*** python: {sys.version.split(' ')[0]} ***")
    for _ in range(list_size_steps):
        test_num = int(num_elements / list_size)
        print(f"{list_size=}, {test_num=}")
    
        time_loops = timeit.timeit("test_loop(list_size)", globals=globals(), number=test_num)
        time_comprehension = timeit.timeit("test_comprehension(list_size)", globals=globals(), number=test_num)
        pct = round(time_loops / time_comprehension, 4)
        print("\tLoops vs Comp: {:3.0%} :: {:1.4f} vs {:1.4f}".format(pct, time_loops, time_comprehension))
    
        list_size *= 10
        print()
    

    Result:

    Python 3.12.1:

    *** python: 3.12.1 ***
    list_size=10, test_num=6000000
        Loops vs Comp: 121% :: 1.3279 vs 1.0990
    
    list_size=100, test_num=600000
        Loops vs Comp: 143% :: 0.8200 vs 0.5727
    
    list_size=1000, test_num=60000
        Loops vs Comp: 128% :: 1.0748 vs 0.8421
    
    list_size=10000, test_num=6000
        Loops vs Comp: 124% :: 1.1820 vs 0.9527
    
    list_size=100000, test_num=600
        Loops vs Comp: 120% :: 1.5919 vs 1.3287
    
    list_size=1000000, test_num=60
        Loops vs Comp: 116% :: 1.7638 vs 1.5161
    
    list_size=10000000, test_num=6
        Loops vs Comp: 114% :: 1.9591 vs 1.7146
    

    Python 3.11.7:

    *** python: 3.11.7 ***
    list_size=10, test_num=6000000
        Loops vs Comp: 81% :: 1.3323 vs 1.6385 <--- THIS!
    
    list_size=100, test_num=600000
        Loops vs Comp: 129% :: 0.7914 vs 0.6140
    
    list_size=1000, test_num=60000
        Loops vs Comp: 116% :: 0.8822 vs 0.7592
    
    list_size=10000, test_num=6000
        Loops vs Comp: 113% :: 0.9733 vs 0.8584
    
    list_size=100000, test_num=600
        Loops vs Comp: 112% :: 1.3436 vs 1.2003
    
    list_size=1000000, test_num=60
        Loops vs Comp: 111% :: 1.5391 vs 1.3831
    
    list_size=10000000, test_num=6
        Loops vs Comp: 109% :: 1.7964 vs 1.6524
    

    Update: Decided to add numbers for Python 3.10 and 3.9 too.

    As you can see, the list comprehension performance issue only affected Python 3.11. Older and newer versions are not affected, at least among the Python versions I've tested (3.9, 3.10, 3.11, 3.12).

    Python 3.10.13:

    *** python: 3.10.13 ***
    list_size=10, test_num=6000000
        Loops vs Comp: 125% :: 2.3019 vs 1.8439
    
    list_size=100, test_num=600000
        Loops vs Comp: 217% :: 1.6378 vs 0.7552
    
    list_size=1000, test_num=60000
        Loops vs Comp: 201% :: 1.8167 vs 0.9029
    
    list_size=10000, test_num=6000
        Loops vs Comp: 190% :: 1.7905 vs 0.9442
    
    list_size=100000, test_num=600
        Loops vs Comp: 158% :: 2.2074 vs 1.3967
    
    list_size=1000000, test_num=60
        Loops vs Comp: 151% :: 2.4325 vs 1.6135
    
    list_size=10000000, test_num=6
        Loops vs Comp: 142% :: 2.5604 vs 1.8094
    

    Python 3.9.18:

    *** python: 3.9.18 ***
    list_size=10, test_num=6000000
        Loops vs Comp: 129% :: 2.1633 vs 1.6834
    
    list_size=100, test_num=600000
        Loops vs Comp: 190% :: 1.4944 vs 0.7847
    
    list_size=1000, test_num=60000
        Loops vs Comp: 194% :: 1.8775 vs 0.9685
    
    list_size=10000, test_num=6000
        Loops vs Comp: 158% :: 2.3659 vs 1.4978
    
    list_size=100000, test_num=600
        Loops vs Comp: 151% :: 2.4121 vs 1.5974
    
    list_size=1000000, test_num=60
        Loops vs Comp: 147% :: 2.5078 vs 1.7022
    
    list_size=10000000, test_num=6
        Loops vs Comp: 139% :: 2.7699 vs 1.9927