Search code examples
pythonpandasnestedlist-comprehensionpython-itertools

Double list comphrension in ordinary speech


I've implemented the following list comprehension in my code, and it works:

[string for row in series for string in row]

Background: I had a pandas series of lists of strings. So each row of the series had a list, and each list had several strings. So I wanted to use a list comprehension to pull out all the strings from each list in the series and compile them into one big list.

The question: Just reading the syntax, I'm having trouble understanding inuitatively what's going on in the comphrension. Can anyone spell it out in plain english? For example, for a standard list comphrension ([x for x in z]), I might describe that as "a list with a x for every x in z."

I don't know if this is really a doable question, but I thought it was worth asking! Thanks.


Solution

  • All it does is flatten a list of lists so for example

    nested_list = [[1, 2, 3],
                   [4],
                   [5, 6]]
    flat_list = [item for inner_list in nested_list for item in inner_list]
    
    # flat_list will be [1, 2, 3, 4, 5, 6]
    

    To understand it, just write it out as a nested for loop:

    result = []
    for row in series:
        for string in row:
            result.append(string)
    

    basically it reads left to right as a nested loop, but the inner code comes at the start.

    You can kind of see this by messing up the spacing in your original code:

    result = [
        string 
        for row in series # : <- pretend colons
            for string in row # : 
                # result.append(string) <- this bit just goes to the start in list comprehension land
    ]
    

    By the way, you can apparently do faster using itertools.chain (but I'm not sure if that still applies on a pd.Series):

    import itertools
    result  = list(itertools.chain(*series.tolist()))