I am very new to coding and picked python as my first coding language. I am working on a exercise called Run Length Encoding. After some searching I found solution but I am having hard time understanding the code. Can someone break this code and explain in simple words. Thank You.
string = 'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB'
x=''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
The first thing to do is to break down the expression into smaller ones:
bits = ['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)]
x=''.join(bits)
The second one is easy: we have some list of bits, each of which is a string, and we just join them up into one big string.
The first one is a list comprehension. Every list comprehension can be rewritten as a for
statement around an append
, so let's do that:
bits = []
for k, g in groupby(string):
bits.append('{}{}'.format(k, sum(1 for _ in g)))
The groupby
part may seem tricky if you've never seen groupby
before, but if you just call it separately, it should be pretty obvious:
for k, g in groupby(string):
print(k, list(g))
This gives you:
W ['W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W']
B ['B']
W ['W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W']
B ['B', 'B', 'B']
W ['W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W']
B ['B']
In other words, each group g
is a run of equal elements, and k
is just the thing they're all equal to.
Now let's break down the inner statement:
bits.append('{}{}'.format(k, sum(1 for _ in g)))
into parts:
count = sum(1 for _ in g)
bit = '{}{}'.format(k, count)
bits.append(bit)
The last two lines are hopefully obvious. So, that just leaves us with the first one.
We're calling sum
on a generator expression. Generator expressions are just like list comprehensions but lazy, and we don't care about the laziness here, so we can break it down the same way we did above:
things = []
for _ in g:
things.append(1)
count = sum(things)
So now it should be obvious what sum(1 for _ in g)
does: it's just the number of things in g
. In fact, it's just like calling len(g)
, except that it works for arbitrary iterables, including lazy iterators, not just sequences.
This is the idiomatic way to count possibly-lazy iterables—but we could have replaced it (at the cost of a bit of performance) with:
count = len(list(g))
So, putting it all back together:
groupby
to convert the string into a bunch of groups, each of which is the same character repeated over and over.'W12'
using the key 'W'
and the fact that the group of W
s has 12 members.['W12', 'B1', 'W12', 'B3', 'W24', 'B1']
and join it into a string 'W12B1W12B3W24B1'
.