I'm learning python and need to use list comprehensions to answer a question on an assignment, but can't figure out an error I'm getting. I have a dataframe with participants, their ages, and their scores across different tests. I tried to use list comprehension to get a list of scores from participants under a certain age,
df['scoreunder18'] = [row for row in df['score'] if df['Age'] < 18 in row]
but got the following error:
*** ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I tried
df['scoreunder18'] = [row for row in df['score'] if (df['Age'] < 18).item in row]
but that just returns the values from the score column without honoring the condition.
Any help would be appreciated please and thank you!
The ValueError occurs because the entire column df['age']
is being compared to the integer 18. However, you may encounter a different error by directing the list comprehension output right back into the DataFrame as df['scoreunder18']
. This is because the length of the list may not match the length of the DataFrame's index.
In the example below the data will recreate the index-output length mismatch. I used zip()
which combines each value pair in the two columns as a tuple.
import pandas as pd
d = {'participant': ['a', 'b', 'c'], 'age': [17, 21, 22], 'score': [75, 85, 95]}
df = pd.DataFrame(data=d)
list_under_18 = [sc for ag, sc in zip(df['age'], df['score']) if ag < 18]
list_under_18 = [75]
which has a length of one while the DataFrame index is three. To attach this as a column to the original DataFrame convert the list to a Series, which will fill in the empty values with NaN values.
df['under_18_scores'] = pd.Series([sc for ag, sc in zip(df['age'], df['score']) if ag < 18])
Here are some similar answers for reference:
Adding list with different length as a new column to a dataframe