This question is regarding the solution to the (first half of) Day 5 problem for Advent of Code 2020.
I have written two different functions to get to the same result, i.e. decoding the boarding pass string into a row, column coordinate. In the first case I went through a binary search based upon each character in the string:
def decode(bp):
row = bp[:7]
col = bp[-3:]
row_lower, row_upper = 0, 127
col_lower, col_upper = 0, 7
for char in row:
if char=='F':
row_upper = ((row_upper+row_lower))//2
else:
row_lower = ((row_upper+row_lower)+1)//2
for char in col:
if char=='L':
col_upper = ((col_upper+col_lower))//2
else:
col_lower = ((col_upper+col_lower)+1)//2
sid = (row_lower*8)+col_lower
return (row_lower, col_lower, sid)
I then realized that if we consider the string as a binary number, there is a 1:1 map to each row, column coordinate, and so I also wrote this alternative solution to the problem:
def alternative_decode(bp):
bp = bp.replace('F', '0').replace('L', '0').replace('B', '1').replace('R', '1')
return(int(bp[:7], 2), int(bp[-3:], 2), (int(bp[:7], 2)*8)+int(bp[-3:], 2))
I wrote the second solution because I expected it to be considerably faster than the first, given that it's a simple binary conversion rather than a binary search. However, when timing the two approaches, I noticed that both have the same running time, i.e roughly between 0.0000020 and 0.0000025 seconds.
What is the cause of this? Is there some Python magic happening behind the scenes that makes both solutions equally efficient, or did I write them in such a way that makes them equally unefficient?
On my computer, your alternative becomes significantly faster if you reuse the row/col result, rather than repeating the int() calls for the sid
return:
def alternative_decode_reuse(bp):
bp = bp.replace('F', '0').replace('L', '0').replace('B', '1').replace('R', '1')
row = int(bp[:7], 2)
col = int(bp[-3:], 2)
sid = row*8 + col
return (row, col, sid)
After putting all three functions in a file decoding.py
, this gives us:
tmp$ python -m timeit -s 'import decoding' -- 'decoding.decode("FFBBFBFLLR")'
1000000 loops, best of 3: 1.75 usec per loop
tmp$ python -m timeit -s 'import decoding' -- 'decoding.alternative_decode("FFBBFBFLLR")'
1000000 loops, best of 3: 1.62 usec per loop
tmp$ python -m timeit -s 'import decoding' -- 'decoding.alternative_decode_reuse("FFBBFBFLLR")'
1000000 loops, best of 3: 1.32 usec per loop
At this point, most of the time is spent doing the replace()
line. So what if we didn't?
def third_decode(bp):
row = 0
for c in bp[:7]:
row <<= 1
if c == 'B':
row += 1
col = 0
for c in bp[7:]:
col <<= 1
if c == 'R':
col += 1
sid = row * 8 + col
return (row, col, sid)
This gives:
tmp$ python -m timeit -s 'import decoding' -- 'decoding.third_decode("FFBBFBFLLR")'
1000000 loops, best of 3: 1.37 usec per loop
Slightly worse, or at least not clearly better. What if we also use the fact that the desired sid
is equivalent to a (binary) concatenation of the row/col numbers?
def fourth_decode(bp):
sid = 0
for c in bp:
sid <<= 1
if c in 'BR':
sid += 1
row = sid >> 3
col = sid & 7
return (row, col, sid)
Yes, that helps a little:
tmp$ python -m timeit -s 'import decoding' -- 'decoding.fourth_decode("FFBBFBFLLR")'
1000000 loops, best of 3: 1.16 usec per loop
At this point, I'm tired of editing cmdline args to re-run everything, so let's add this to the bottom of decoding.py
:
if __name__ == '__main__':
loops = 1000000
funcs = (
decode, alternative_decode, alternative_decode_reuse, replace_only,
third_decode, fourth_decode,
)
from timeit import Timer
for fun in funcs:
cmd = 'decoding.%s("FFBBFBFLLR")' % fun.__name__
timer = Timer(cmd, setup='import decoding')
totaltime = min(timer.repeat(5, loops))
fmt = '%25s returned %14r -- %8d loops, best of 5: %6d ns per loop'
arg = (fun.__name__, fun('FFBBFBFLLR'), loops, totaltime*1000000000/loops)
print(fmt % arg)
This lets us run all the functions without hassle:
tmp$ python decoding.py
decode returned (26, 1, 209) -- 1000000 loops, best of 5: 2090 ns per loop
alternative_decode returned (26, 1, 209) -- 1000000 loops, best of 5: 1829 ns per loop
alternative_decode_reuse returned (26, 1, 209) -- 1000000 loops, best of 5: 1414 ns per loop
replace_only returned None -- 1000000 loops, best of 5: 700 ns per loop
third_decode returned (26, 1, 209) -- 1000000 loops, best of 5: 1368 ns per loop
fourth_decode returned (26, 1, 209) -- 1000000 loops, best of 5: 1123 ns per loop
After that, I'm out of ideas of how to make it go faster. But last year, an Advent of Code-solving acquaintance told me he's using the pypy
Python implementation for its speed. Maybe it can help?
tmp$ pypy decoding.py
decode returned (26, 1, 209) -- 1000000 loops, best of 5: 151 ns per loop
alternative_decode returned (26, 1, 209) -- 1000000 loops, best of 5: 4 ns per loop
alternative_decode_reuse returned (26, 1, 209) -- 1000000 loops, best of 5: 3 ns per loop
replace_only returned None -- 1000000 loops, best of 5: 3 ns per loop
third_decode returned (26, 1, 209) -- 1000000 loops, best of 5: 141 ns per loop
fourth_decode returned (26, 1, 209) -- 1000000 loops, best of 5: 138 ns per loop
Well, all my effort for nothing! :)
It looks like pypy's replace()
and int()
functions are much faster. Also, while its JIT does make our various loopy functions faster, it's still preferable to use the builtin functions when possible.