python pandas nested list-comprehension python-itertools

Double list comphrension in ordinary speech

I've implemented the following list comprehension in my code, and it works:

[string for row in series for string in row]

Background: I had a pandas series of lists of strings. So each row of the series had a list, and each list had several strings. So I wanted to use a list comprehension to pull out all the strings from each list in the series and compile them into one big list.

The question: Just reading the syntax, I'm having trouble understanding inuitatively what's going on in the comphrension. Can anyone spell it out in plain english? For example, for a standard list comphrension ([x for x in z]), I might describe that as "a list with a x for every x in z."

I don't know if this is really a doable question, but I thought it was worth asking! Thanks.

Solution

All it does is flatten a list of lists so for example

nested_list = [[1, 2, 3],
               [4],
               [5, 6]]
flat_list = [item for inner_list in nested_list for item in inner_list]

# flat_list will be [1, 2, 3, 4, 5, 6]

To understand it, just write it out as a nested for loop:

result = []
for row in series:
    for string in row:
        result.append(string)

basically it reads left to right as a nested loop, but the inner code comes at the start.

You can kind of see this by messing up the spacing in your original code:

result = [
    string 
    for row in series # : <- pretend colons
        for string in row # : 
            # result.append(string) <- this bit just goes to the start in list comprehension land
]

By the way, you can apparently do faster using itertools.chain (but I'm not sure if that still applies on a pd.Series):

import itertools
result  = list(itertools.chain(*series.tolist()))