Search code examples
pythonpython-3.xrun-length-encoding

Run Length Encoding the code


I am very new to coding and picked python as my first coding language. I am working on a exercise called Run Length Encoding. After some searching I found solution but I am having hard time understanding the code. Can someone break this code and explain in simple words. Thank You.

string = 'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB'
x=''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])

Solution

  • The first thing to do is to break down the expression into smaller ones:

    bits = ['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)]
    x=''.join(bits)
    

    The second one is easy: we have some list of bits, each of which is a string, and we just join them up into one big string.

    The first one is a list comprehension. Every list comprehension can be rewritten as a for statement around an append, so let's do that:

    bits = []
    for k, g in groupby(string):
        bits.append('{}{}'.format(k, sum(1 for _ in g)))
    

    The groupby part may seem tricky if you've never seen groupby before, but if you just call it separately, it should be pretty obvious:

    for k, g in groupby(string):
        print(k, list(g))
    

    This gives you:

    W ['W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W']
    B ['B']
    W ['W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W']
    B ['B', 'B', 'B']
    W ['W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W', 'W']
    B ['B']
    

    In other words, each group g is a run of equal elements, and k is just the thing they're all equal to.


    Now let's break down the inner statement:

    bits.append('{}{}'.format(k, sum(1 for _ in g)))
    

    into parts:

    count = sum(1 for _ in g)
    bit = '{}{}'.format(k, count)
    bits.append(bit)
    

    The last two lines are hopefully obvious. So, that just leaves us with the first one.


    We're calling sum on a generator expression. Generator expressions are just like list comprehensions but lazy, and we don't care about the laziness here, so we can break it down the same way we did above:

    things = []
    for _ in g:
        things.append(1)
    count = sum(things)
    

    So now it should be obvious what sum(1 for _ in g) does: it's just the number of things in g. In fact, it's just like calling len(g), except that it works for arbitrary iterables, including lazy iterators, not just sequences.

    This is the idiomatic way to count possibly-lazy iterables—but we could have replaced it (at the cost of a bit of performance) with:

    count = len(list(g))
    

    So, putting it all back together:

    • Use groupby to convert the string into a bunch of groups, each of which is the same character repeated over and over.
    • For each one:
      • Count the length of that group.
      • Make a string like 'W12' using the key 'W' and the fact that the group of Ws has 12 members.
      • Append that to a list.
    • Take that list of ['W12', 'B1', 'W12', 'B3', 'W24', 'B1'] and join it into a string 'W12B1W12B3W24B1'.