Search code examples
pythonpandasdataframemultiple-columns

How to create columns from specific values per row in Python


I have similar df like below,

df = pd.DataFrame({'DRINKS':['WHISKEY','VODKA','WATER'],
                    'STRONG':[5,5,0],
                    'SOUR':[5,4,0]})

And I want to transform it to this one (Read 5s from the dataframe and when it matches, create a column with whatever name(I named it Cat1) and get the column name(STRONG) where the value was 5, then move on tho the next column and do the same operation until there no columns with rows with a value 5. The final outcome should be like below:

df = pd.DataFrame({'DRINKS':['WHISKEY','VODKA','WATER'],
                    'Cat1':["STRONG","STRONG",np.nan],
                    'Cat2':["SOUR",np.nan,np.nan]})

I tried to do it with

df['Cat1']=(df == 5).idxmax(axis=1)

but it gives me only 1 column name for Whiskey.

Any help will be appreciated


Solution

  • Select and set all columns without first by DataFrame.iloc and numpy.where:

    df = df.iloc[:, :1].join(pd.DataFrame(np.where(df.iloc[:, 1:].eq(5),df.columns[1:],np.nan),
                                index=df.index, 
                                columns=[f'Cat{i}' for i,_ in enumerate(df.columns[1:], 1)]))
    print (df)
        DRINKS    Cat1  Cat2
    0  WHISKEY  STRONG  SOUR
    1    VODKA  STRONG   NaN
    2    WATER     NaN   NaN
        
    

    Or:

    df.iloc[:, 1:] = np.where(df.iloc[:, 1:].eq(5), df.columns[1:], np.nan)
    df = df.rename(columns=dict(zip(df.columns[1:],
                                   [f'Cat{i}' for i,_ in enumerate(df.columns[1:], 1)])))
    print (df)
        DRINKS    Cat1  Cat2
    0  WHISKEY  STRONG  SOUR
    1    VODKA  STRONG   NaN
    2    WATER     NaN   NaN