Search code examples
pythonpandasdataframemultiple-columnsisin

How to Create a Single Column off multiple columns in Pandas using .isin() and a list?


I have broken a more complex problem into something simpler. The actual problem has larger lists and several more columns.

Starting with this df:

 i |     COL1   |      COL2   |    COL3     |    COL4  |  Revenue    |  QTY    | Products
 
0  |      Coin  |   Gold Krug | Gold Coin   |  Coins   | 2333677473  |   21    |      12

1  | Gold Coin  |     Coins   | Gold Coin   |  Coins   | 2564774784  |   28    |    14

2  | Gold Coin  |     Coins   | Gold Krug   | Coins    |3256666647   |   35    |     16

3  |Gold Coin   |    Coins    |  Coins      |Gold Krug |    3456788  |   42    |     18

4  |Gold Krug   | Gold Coin   |  Coins      | Coins    |  4588960    | 49      |   20

5  |Gold Coin   |    Coins    | Gold Krug   | Coins    |346869909    |56       | 22

6  |Gold Coin   |    Coins    | Gold Coin   |  Coins   | 3777989     |63       | 24

7  |Gold Coin   |Silver Krug  |Gold Coin    | Coins    | 37687589    |70       | 26

8  |Gold Coin   |    Coins    |Gold Coin    | Coins    | 45789889    |77       | 28

9  |Gold Coin   | Gold Krug   |Gold Coin    |Coins     |    468      |84       | 30

I would like output to be DF with a new column like this:

i |  Category    |    Revenue         | QTY   |Products 

0 |Gold Krug     |  2333677473        |21     |    12

2 |Gold Krug     |  3256666647        | 35    |     16

3 |Gold Krug     |     3456788        | 42    |     18

4 | Gold Krug    |      4588960       |  49   |      20

5 | Gold Krug    |    346869909       |  56   |      22

7 | Silver Krug  |     37687589       |  70   |      26

9 | Gold Krug    |          468       |  84   |      30

I used this, but simply do not understand how to create a new column using the value in the list that matches for the new column :

KRUG = ['Gold Krug', 'Silver Krug', 'Gold Maple','Gold Eagle']

df = df[df[['COL1', 'COL2', 'COL3', 'COL4 ']].isin(KRUG).any(axis=1)]

print(df)

output :
i   |COL1         |COL2          |COL3          |COL4       |Revenue    |QTY    |Products
 
0   |Coin         |Gold Krug     |Gold Coin     |Coins      |2333677473 |21     |12

2   |Gold Coin    |Coins         |Gold Krug     |Coins      |3256666647 |35     |16

3   |Gold Coin    |Coins         |Coins         |Gold Krug  |3456788    |42     |18

4   |Gold Krug    |Gold Coin     |Coins         |Coins      |4588960    |49     |20

5   |Gold Coin    |Coins         |Gold Krug     |Coins      |346869909  |56     |22

7   |Gold Coin    |Silver Krug   |Gold Coin     |Coins      |37687589   |70     |26

9   |Gold Coin    |Gold Krug     |Gold Coin     |Coins      |468        |84     |30

Solution

  • Split the search into two sections, and concatenate:

    category = (df.filter(like='COL')
                  .agg(','.join, axis = 1)
                  .str.extract(fr"({'|'.join(KRUG)})")
                  .dropna()
                  .set_axis(['category'], axis = 'columns')
                )
    
    others = df.loc[df.filter(like='COL').isin(KRUG).any(1), 
                    ['Revenue', 'QTY', 'Products']]
    
    pd.concat([category, others], axis = 'columns')
    
          category     Revenue  QTY  Products
    0    Gold Krug  2333677473   21        12
    2    Gold Krug  3256666647   35        16
    3    Gold Krug     3456788   42        18
    4    Gold Krug     4588960   49        20
    5    Gold Krug   346869909   56        22
    7  Silver Krug    37687589   70        26
    9    Gold Krug         468   84        30