Search code examples
pythonpandaspandas-apply

Pandas DataFrame aggregated column with names of other columns as value


I'm trying to create a new column in my DataFrame that is a list of aggregated column names. Here's a sample DataFrame:

In [1]: df = pd.DataFrame({'A':[1,2,3],
               'B':[4,5,6],
               'C':[7,8,9],
               'D':[1,3,5],
               'E':[5,3,6],
               'F':[7,4,3]})
In [2]: df
Out[2]:
   A  B  C  D  E  F
0  1  4  7  1  5  7
1  2  5  8  3  3  4
2  3  6  9  5  6  3

I'd like to create a new column containing a list of column names where a certain condition is met. Say that I'm interested in columns where value > 3 -- I would want an output that looks like this:

In [3]: df
Out[3]:
   A  B  C  D  E  F  Flag
0  1  4  7  1  5  7  ['B', 'C', 'E', 'F']
1  2  5  8  3  3  4  ['B', 'C', 'F']
2  3  6  9  5  6  3  ['B', 'C', 'D', 'E']

Currently, I'm using apply:

df['Flag'] = df.apply(lambda row: [list(df)[i] for i, j in enumerate(row) if j > 3], axis = 1)

This gets the job done, but feels clunky and I'm wondering if there is a more elegant solution.

Thanks!


Solution

  • Use df.dot() here:

    df['Flag']=(df>3).dot(df.columns).apply(list)
    print(df)
    

       A  B  C  D  E  F          Flag
    0  1  4  7  1  5  7  [B, C, E, F]
    1  2  5  8  3  3  4     [B, C, F]
    2  3  6  9  5  6  3  [B, C, D, E]