I'm trying to create a new column in my DataFrame
that is a list of aggregated column names. Here's a sample DataFrame
:
In [1]: df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
In [2]: df
Out[2]:
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
I'd like to create a new column containing a list of column names where a certain condition is met. Say that I'm interested in columns where value > 3 -- I would want an output that looks like this:
In [3]: df
Out[3]:
A B C D E F Flag
0 1 4 7 1 5 7 ['B', 'C', 'E', 'F']
1 2 5 8 3 3 4 ['B', 'C', 'F']
2 3 6 9 5 6 3 ['B', 'C', 'D', 'E']
Currently, I'm using apply
:
df['Flag'] = df.apply(lambda row: [list(df)[i] for i, j in enumerate(row) if j > 3], axis = 1)
This gets the job done, but feels clunky and I'm wondering if there is a more elegant solution.
Thanks!
Use df.dot()
here:
df['Flag']=(df>3).dot(df.columns).apply(list)
print(df)
A B C D E F Flag
0 1 4 7 1 5 7 [B, C, E, F]
1 2 5 8 3 3 4 [B, C, F]
2 3 6 9 5 6 3 [B, C, D, E]