I have similar df like below,
df = pd.DataFrame({'DRINKS':['WHISKEY','VODKA','WATER'],
'STRONG':[5,5,0],
'SOUR':[5,4,0]})
And I want to transform it to this one (Read 5s from the dataframe and when it matches, create a column with whatever name(I named it Cat1) and get the column name(STRONG) where the value was 5, then move on tho the next column and do the same operation until there no columns with rows with a value 5. The final outcome should be like below:
df = pd.DataFrame({'DRINKS':['WHISKEY','VODKA','WATER'],
'Cat1':["STRONG","STRONG",np.nan],
'Cat2':["SOUR",np.nan,np.nan]})
I tried to do it with
df['Cat1']=(df == 5).idxmax(axis=1)
but it gives me only 1 column name for Whiskey.
Any help will be appreciated
Select and set all columns without first by DataFrame.iloc
and numpy.where
:
df = df.iloc[:, :1].join(pd.DataFrame(np.where(df.iloc[:, 1:].eq(5),df.columns[1:],np.nan),
index=df.index,
columns=[f'Cat{i}' for i,_ in enumerate(df.columns[1:], 1)]))
print (df)
DRINKS Cat1 Cat2
0 WHISKEY STRONG SOUR
1 VODKA STRONG NaN
2 WATER NaN NaN
Or:
df.iloc[:, 1:] = np.where(df.iloc[:, 1:].eq(5), df.columns[1:], np.nan)
df = df.rename(columns=dict(zip(df.columns[1:],
[f'Cat{i}' for i,_ in enumerate(df.columns[1:], 1)])))
print (df)
DRINKS Cat1 Cat2
0 WHISKEY STRONG SOUR
1 VODKA STRONG NaN
2 WATER NaN NaN