Pretend you have some expensive, cpu-intensive function, for example parsing an xml string. In this case, our trivial function will be:
def parse(foo):
return int(foo)
As input, you have a list of strings, and you want to parse them and find the subset of parsed strings that meet some condition. Ideally we want to perform the parse only one time per string.
Without a list comprehension, you could:
olds = ["1", "2", "3", "4", "5"]
news = []
for old in olds:
new = parse(old) # First and only Parse
if new > 3:
news.append(new)
To do this as a list comprehension, it seems that you have to perform the parse twice, once to get the new value and once to perform the conditional check:
olds = ["1", "2", "3", "4", "5"]
news = [
parse(new) # First Parse
for new in olds
if parse(new) > 3 # Second Parse
]
For example, this syntax will not work:
olds = ["1", "2", "3", "4", "5"]
# Raises SyntaxError: can't assign to function call
news = [i for parse(i) in olds if i > 5]
Using a generator seems to work:
def parse(strings):
for string in strings:
yield int(string)
olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds) if i > 3]
However you could just throw the conditional in the generator:
def parse(strings):
for string in strings:
val = int(string)
if val > 3:
yield val
olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds)]
What I would like to know is, in terms of optimization (not reusability, etc), which one is better, the one where the parsing occurs in the generator but the conditional check occurs in the list comprehension, or the one where both the parsing and the conditional check occurs in the generator? Is there a better alternative than either of these approaches?
Here are some output of dis.dis
in Python 3.6.5. Note that in my version of Python, in order to disassemble list comprehensions, we have to use f.__code__.co_consts[1]
. Check this answer for an explanation.
def parse(strings):
for string in strings:
yield int(string)
def main(strings):
return [i for i in parse(strings) if i > 3]
assert main(["1", "2", "3", "4", "5"]) == [4, 5]
dis.dis(main.__code__.co_consts[1])
"""
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (3)
12 COMPARE_OP 4 (>)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
"""
dis.dis(parse)
"""
2 0 SETUP_LOOP 22 (to 24)
2 LOAD_FAST 0 (strings)
4 GET_ITER
>> 6 FOR_ITER 14 (to 22)
8 STORE_FAST 1 (string)
3 10 LOAD_GLOBAL 0 (int)
12 LOAD_FAST 1 (string)
14 CALL_FUNCTION 1
16 YIELD_VALUE
18 POP_TOP
20 JUMP_ABSOLUTE 6
>> 22 POP_BLOCK
>> 24 LOAD_CONST 0 (None)
26 RETURN_VALUE
"""
def parse(strings):
for string in strings:
val = int(string)
if val > 3:
yield val
def main(strings):
return [i for i in parse(strings)]
assert main(["1", "2", "3", "4", "5"]) == [4, 5]
dis.dis(main.__code__.co_consts[1])
"""
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
"""
dis.dis(parse)
"""
2 0 SETUP_LOOP 34 (to 36)
2 LOAD_FAST 0 (strings)
4 GET_ITER
>> 6 FOR_ITER 26 (to 34)
8 STORE_FAST 1 (string)
3 10 LOAD_GLOBAL 0 (int)
12 LOAD_FAST 1 (string)
14 CALL_FUNCTION 1
16 STORE_FAST 2 (val)
4 18 LOAD_FAST 2 (val)
20 LOAD_CONST 1 (3)
22 COMPARE_OP 4 (>)
24 POP_JUMP_IF_FALSE 6
5 26 LOAD_FAST 2 (val)
28 YIELD_VALUE
30 POP_TOP
32 JUMP_ABSOLUTE 6
>> 34 POP_BLOCK
>> 36 LOAD_CONST 0 (None)
38 RETURN_VALUE
def parse(string):
return int(string)
def main(strings):
values = []
for string in strings:
value = parse(string)
if value > 3:
values.append(value)
return values
assert main(["1", "2", "3", "4", "5"]) == [4, 5]
dis.dis(main)
"""
2 0 BUILD_LIST 0
2 STORE_FAST 1 (values)
3 4 SETUP_LOOP 38 (to 44)
6 LOAD_FAST 0 (strings)
8 GET_ITER
>> 10 FOR_ITER 30 (to 42)
12 STORE_FAST 2 (string)
4 14 LOAD_GLOBAL 0 (parse)
16 LOAD_FAST 2 (string)
18 CALL_FUNCTION 1
20 STORE_FAST 3 (value)
5 22 LOAD_FAST 3 (value)
24 LOAD_CONST 1 (3)
26 COMPARE_OP 4 (>)
28 POP_JUMP_IF_FALSE 10
6 30 LOAD_FAST 1 (values)
32 LOAD_ATTR 1 (append)
34 LOAD_FAST 3 (value)
36 CALL_FUNCTION 1
38 POP_TOP
40 JUMP_ABSOLUTE 10
>> 42 POP_BLOCK
7 >> 44 LOAD_FAST 1 (values)
46 RETURN_VALUE
"""
dis.dis(parse)
"""
2 0 LOAD_GLOBAL 0 (int)
2 LOAD_FAST 0 (string)
4 CALL_FUNCTION 1
6 RETURN_VALUE
"""
Note how the disassembly of the first two, that use list comprehensions with generators, indicate two for loops, one in the main (list comprehension) and one in the parse (generator). This isn't as bad as it sounds, right? E.g., the entire operation is O(n) and not O(n^2) ?
def parse(string):
return int(string)
def main(strings):
return [val for val in (parse(string) for string in strings) if val > 3]
assert main(["1", "2", "3", "4", "5"]) == [4, 5]
dis.dis(main.__code__.co_consts[1])
"""
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (val)
8 LOAD_FAST 1 (val)
10 LOAD_CONST 0 (3)
12 COMPARE_OP 4 (>)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (val)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
"""
dis.dis(parse)
"""
2 0 LOAD_GLOBAL 0 (int)
2 LOAD_FAST 0 (string)
4 CALL_FUNCTION 1
6 RETURN_VALUE
"""
I think you can do it more simply than you think:
olds = ["1", "2", "3", "4", "5"]
news = [new for new in (parse(old) for old in olds) if new > 3]
Or just:
news = [new for new in map(parse, olds) if new > 3]
Both of those ways parse
is only called once per item.