(Update: The issue has been found. See my answer. This turned out to be a performance issue in Python 3.11, and only affects small lists. For Python 3.9, 3.10 and 3.12, list comprehension is ALWAYS faster, regardless of list size.)
I've seen claims that "list comprehension is always faster because it avoids the lookup and calling of the append()
method".
The difference is how appending is done:
LOAD_FAST(k); LIST_APPEND
..append()
calls use five bytecodes, one of which is a method lookup (hash table), LOAD_METHOD(append); LOAD_FAST(k); PRECALL; CALL; POP_TOP
.So obviously, list comprehension should be faster since it avoids costly function lookups/calls...
But no matter how I twist this, I see two things:
.append()
. A friend verified this on their computer too.Does anyone know what's going on? Why is the Pythonic "list comprehension" so much slower?
I am running Python 3.11 on Linux. He is running Windows. So it's slower on both platforms.
My code is below.
from typing import Any
def filter_unset_loop(
settings: dict[str, Any],
new_settings: dict[str, Any],
) -> dict[str, Any]:
remove_keys: list[str] = []
for k, v in new_settings.items():
if (v is None or k == "self" or k == "config"):
remove_keys.append(k)
for k in remove_keys:
del new_settings[k]
return new_settings
def filter_unset_listcomp(
settings: dict[str, Any],
new_settings: dict[str, Any],
) -> dict[str, Any]:
remove_keys: list[str] = [
k for k, v in new_settings.items() if (v is None or k == "self" or k == "config")
]
for k in remove_keys:
del new_settings[k]
return new_settings
def test_loop():
filter_unset_loop(
{},
{
"self": True,
"what": None,
"hello": "world",
"great": "key",
"1": 1,
"2": 1,
"3": 1,
"4": 1,
"5": 1,
"6": 1,
"7": 1,
"8": 1,
"9": 1,
"10": 1,
"11": 1,
"12": 1,
"13": 1,
"14": 1,
"15": 1,
"16": 1,
},
)
def test_listcomp():
filter_unset_listcomp(
{},
{
"self": True,
"what": None,
"hello": "world",
"great": "key",
"1": 1,
"2": 1,
"3": 1,
"4": 1,
"5": 1,
"6": 1,
"7": 1,
"8": 1,
"9": 1,
"10": 1,
"11": 1,
"12": 1,
"13": 1,
"14": 1,
"15": 1,
"16": 1,
},
)
import timeit
n = 20_000_000
t1 = timeit.Timer(test_loop)
print("loop (seconds):", t1.timeit(number=n))
t2 = timeit.Timer(test_listcomp)
print("listcomp (seconds):", t2.timeit(number=n))
import dis
print("\nLOOP DISASSEMBLY:\n")
dis.dis(filter_unset_loop)
print("\n" * 10)
print("\nLISTCOMP DISASSEMBLY:\n")
dis.dis(filter_unset_listcomp)
So the issue only happens in Python 3.11. They most likely changed some code related to initialization of list comprehensions in that version, which gave it an unfair overhead penalty. They've fixed it in 3.12, and it wasn't an issue in 3.9 and 3.10.
Recommendation: Use list comprehensions. They are better. :)
Alright, the answer is that it depends.
It depends on how many .append()
operations you will be doing, and it also depends on your Python version!
For Python 3.12, 3.10 and 3.9, List Comprehension is UNIVERSALLY FASTER (see bottom/final edit of this answer for more info about that).
For Python 3.11, List Comprehension is slower in some cases, as follows:
.append()
, list comprehension's overhead means that it's slower, as seen in my original question..append()
calls, the list comprehension begins to gain a performance lead.Here's the updated test where I generate a dictionary with over 10 000 keys, which thereby made the loop/.append()
variant about 6.5% slower than list comprehension due to having so many elements (tons of append()
calls), basically inversing the performance, when testing this in Python 3.11.
List comprehension is preferable since it keeps the code cleaner and is always faster for lots of appends, and also makes your code more future-proof.
Here's my modified test with 10 000 items:
from typing import Any
def filter_unset_loop(
settings: dict[str, Any],
new_settings: dict[str, Any],
) -> dict[str, Any]:
remove_keys: list[str] = []
for k, v in new_settings.items():
if (v is None or k == "self" or k == "config"):
remove_keys.append(k)
for k in remove_keys:
del new_settings[k]
return new_settings
def filter_unset_listcomp(
settings: dict[str, Any],
new_settings: dict[str, Any],
) -> dict[str, Any]:
remove_keys: list[str] = [
k for k, v in new_settings.items() if (v is None or k == "self" or k == "config")
]
for k in remove_keys:
del new_settings[k]
return new_settings
def make_dict(out: dict[str, Any]) -> dict[str, Any]:
for k in range(10000):
out[str(k)] = 1
return out
test_dict_template = make_dict({
"self": True,
"what": None,
"hello": "world",
"great": "key",
})
def test_loop():
filter_unset_loop(
{},
test_dict_template.copy(),
)
def test_listcomp():
filter_unset_listcomp(
{},
test_dict_template.copy(),
)
import timeit
n = 100_000
t1 = timeit.Timer(test_loop)
print("loop (seconds):", t1.timeit(number=n))
t2 = timeit.Timer(test_listcomp)
print("listcomp (seconds):", t2.timeit(number=n))
import dis
print("\nLOOP DISASSEMBLY:\n")
dis.dis(filter_unset_loop)
print("\n" * 10)
print("\nLISTCOMP DISASSEMBLY:\n")
dis.dis(filter_unset_listcomp)
Since Juan wanted me to do a test without any del
statement, and with a global dictionary (so that it isn't recreated for every test), I have now done that.
When I do that, List Comprehension is 9% slower than for-append for me, in Python 3.11.
It really seems to come down to the way that list comprehensions generate a lot more bytecode to "prepare the list comprehension", which is its own form of overhead.
(Edit: This list comprehension performance issue only happens in Python 3.11, as further testing below revealed.)
from typing import Any
def filter_unset_loop(
settings: dict[str, Any],
new_settings: dict[str, Any],
) -> dict[str, Any]:
remove_keys: list[str] = []
for k, v in new_settings.items():
if (v is None or k == "self" or k == "config"):
remove_keys.append(k)
return new_settings
def filter_unset_listcomp(
settings: dict[str, Any],
new_settings: dict[str, Any],
) -> dict[str, Any]:
remove_keys: list[str] = [
k for k, v in new_settings.items() if (v is None or k == "self" or k == "config")
]
return new_settings
settings: dict[str, Any] = {}
new_settings: dict[str, Any] = {
"self": True,
"what": None,
"hello": "world",
"great": "key",
"1": 1,
"2": 1,
"3": 1,
"4": 1,
"5": 1,
"6": 1,
"7": 1,
"8": 1,
"9": 1,
"10": 1,
"11": 1,
"12": 1,
"13": 1,
"14": 1,
"15": 1,
"16": 1,
}
def test_loop():
filter_unset_loop(
settings,
new_settings,
)
def test_listcomp():
filter_unset_listcomp(
settings,
new_settings,
)
import timeit
n = 30_000_000
t1 = timeit.Timer(test_loop)
print("loop (seconds):", t1.timeit(number=n))
t2 = timeit.Timer(test_listcomp)
print("listcomp (seconds):", t2.timeit(number=n))
Since some commenters wanted to use timeit to check the differences, but they used incorrect timeit commands, here are the accurate comparisons which use local variables and appropriate amounts of iterations.
Most importantly, I found out what's going on! Python 3.12 has optimized List Comprehensions. In Python 3.11, list comprehensions are 22% slower than .append()
for small lists. In Python 3.12 they are always universally faster.
$ pyenv local 3.11
$ python --version
Python 3.11.7
$ python -m timeit -n 1000 "x=[]" "for i in range(100_000): x.append(i)"
1000 loops, best of 5: 1.72 msec per loop
$ python -m timeit -n 1000 "x=[i for i in range(100_000)]"
1000 loops, best of 5: 1.34 msec per loop
$ python -m timeit -n 10000000 "x=[]" "for i in range(10): x.append(i)"
10000000 loops, best of 5: 191 nsec per loop
$ python -m timeit -n 10000000 "x=[i for i in range(10)]"
10000000 loops, best of 5: 234 nsec per loop
$ pyenv local 3.12
$ python --version
Python 3.12.1
$ python -m timeit -n 1000 "x=[]" "for i in range(100_000): x.append(i)"
1000 loops, best of 5: 2.36 msec per loop
$ python -m timeit -n 1000 "x=[i for i in range(100_000)]"
1000 loops, best of 5: 1.69 msec per loop
$ python -m timeit -n 10000000 "x=[]" "for i in range(10): x.append(i)"
10000000 loops, best of 5: 204 nsec per loop
$ python -m timeit -n 10000000 "x=[i for i in range(10)]"
10000000 loops, best of 5: 160 nsec per loop
JonSG's answer had some good ideas, but his numbers were completely invalid, since he ran so few iterations of each test that the numbers were completely useless. (His code ran the tests for just a few nanoseconds per test...)
I rewrote the test to run an appropriate amount of tests per iteration, to account for variations in background tasks, random CPU cache stutters, CPU throttling, fluctuations in CPU boost frequency, etc.
After fixing his tests, his tests confirm what I discovered above:
for-append()
loops for small lists, but loops are slower when the lists are longer. The reason for this slowness for small lists is probably due to the fact that list comprehension requires creating a new stack frame, which in turn requires a lot of bytecode. So there's lots of overhead for "starting" a list comprehension in Python 3.11. This process was clearly not optimized well in older versions of Python. (Edit: This performance issue only affects Python 3.11, see further tests of older Python versions below!)import sys
import timeit
def test_loop(n):
foo = []
for i in range(n):
foo.append(i)
return foo
def test_comprehension(n):
foo = [i for i in range(n)]
return foo
list_size_steps = 7 # 10 to 10 million
list_size = 10
num_elements = 60_000_000
print(f"*** python: {sys.version.split(' ')[0]} ***")
for _ in range(list_size_steps):
test_num = int(num_elements / list_size)
print(f"{list_size=}, {test_num=}")
time_loops = timeit.timeit("test_loop(list_size)", globals=globals(), number=test_num)
time_comprehension = timeit.timeit("test_comprehension(list_size)", globals=globals(), number=test_num)
pct = round(time_loops / time_comprehension, 4)
print("\tLoops vs Comp: {:3.0%} :: {:1.4f} vs {:1.4f}".format(pct, time_loops, time_comprehension))
list_size *= 10
print()
Result:
Python 3.12.1:
*** python: 3.12.1 ***
list_size=10, test_num=6000000
Loops vs Comp: 121% :: 1.3279 vs 1.0990
list_size=100, test_num=600000
Loops vs Comp: 143% :: 0.8200 vs 0.5727
list_size=1000, test_num=60000
Loops vs Comp: 128% :: 1.0748 vs 0.8421
list_size=10000, test_num=6000
Loops vs Comp: 124% :: 1.1820 vs 0.9527
list_size=100000, test_num=600
Loops vs Comp: 120% :: 1.5919 vs 1.3287
list_size=1000000, test_num=60
Loops vs Comp: 116% :: 1.7638 vs 1.5161
list_size=10000000, test_num=6
Loops vs Comp: 114% :: 1.9591 vs 1.7146
Python 3.11.7:
*** python: 3.11.7 ***
list_size=10, test_num=6000000
Loops vs Comp: 81% :: 1.3323 vs 1.6385 <--- THIS!
list_size=100, test_num=600000
Loops vs Comp: 129% :: 0.7914 vs 0.6140
list_size=1000, test_num=60000
Loops vs Comp: 116% :: 0.8822 vs 0.7592
list_size=10000, test_num=6000
Loops vs Comp: 113% :: 0.9733 vs 0.8584
list_size=100000, test_num=600
Loops vs Comp: 112% :: 1.3436 vs 1.2003
list_size=1000000, test_num=60
Loops vs Comp: 111% :: 1.5391 vs 1.3831
list_size=10000000, test_num=6
Loops vs Comp: 109% :: 1.7964 vs 1.6524
As you can see, the list comprehension performance issue only affected Python 3.11. Older and newer versions are not affected, at least among the Python versions I've tested (3.9, 3.10, 3.11, 3.12).
Python 3.10.13:
*** python: 3.10.13 ***
list_size=10, test_num=6000000
Loops vs Comp: 125% :: 2.3019 vs 1.8439
list_size=100, test_num=600000
Loops vs Comp: 217% :: 1.6378 vs 0.7552
list_size=1000, test_num=60000
Loops vs Comp: 201% :: 1.8167 vs 0.9029
list_size=10000, test_num=6000
Loops vs Comp: 190% :: 1.7905 vs 0.9442
list_size=100000, test_num=600
Loops vs Comp: 158% :: 2.2074 vs 1.3967
list_size=1000000, test_num=60
Loops vs Comp: 151% :: 2.4325 vs 1.6135
list_size=10000000, test_num=6
Loops vs Comp: 142% :: 2.5604 vs 1.8094
Python 3.9.18:
*** python: 3.9.18 ***
list_size=10, test_num=6000000
Loops vs Comp: 129% :: 2.1633 vs 1.6834
list_size=100, test_num=600000
Loops vs Comp: 190% :: 1.4944 vs 0.7847
list_size=1000, test_num=60000
Loops vs Comp: 194% :: 1.8775 vs 0.9685
list_size=10000, test_num=6000
Loops vs Comp: 158% :: 2.3659 vs 1.4978
list_size=100000, test_num=600
Loops vs Comp: 151% :: 2.4121 vs 1.5974
list_size=1000000, test_num=60
Loops vs Comp: 147% :: 2.5078 vs 1.7022
list_size=10000000, test_num=6
Loops vs Comp: 139% :: 2.7699 vs 1.9927