I've implemented the following list comprehension in my code, and it works:
[string for row in series for string in row]
Background: I had a pandas series of lists of strings. So each row of the series had a list, and each list had several strings. So I wanted to use a list comprehension to pull out all the strings from each list in the series and compile them into one big list.
The question: Just reading the syntax, I'm having trouble understanding inuitatively what's going on in the comphrension. Can anyone spell it out in plain english? For example, for a standard list comphrension ([x for x in z]), I might describe that as "a list with a x for every x in z."
I don't know if this is really a doable question, but I thought it was worth asking! Thanks.
All it does is flatten a list of lists so for example
nested_list = [[1, 2, 3],
[4],
[5, 6]]
flat_list = [item for inner_list in nested_list for item in inner_list]
# flat_list will be [1, 2, 3, 4, 5, 6]
To understand it, just write it out as a nested for loop:
result = []
for row in series:
for string in row:
result.append(string)
basically it reads left to right as a nested loop, but the inner code comes at the start.
You can kind of see this by messing up the spacing in your original code:
result = [
string
for row in series # : <- pretend colons
for string in row # :
# result.append(string) <- this bit just goes to the start in list comprehension land
]
By the way, you can apparently do faster using itertools.chain
(but I'm not sure if that still applies on a pd.Series
):
import itertools
result = list(itertools.chain(*series.tolist()))