Search code examples
pythonpandasdataframerepeatmelt

How to melt a dataframe so repeated items become the values that correspond to the index


I have this dataframe:

df = pd.DataFrame({'Status':['CO','AD','AD','AD','OT','CO','OT','AD'],
                   'Mutation':['H157Y','R47H','R47H','R67H','R62H','D87N','D39E','D39E']})
print(df)
  
  Status Mutation
0     CO    H157Y
1     AD     R47H
2     AD     R47H
3     AD     R67H
4     OT     R62H
5     CO     D87N
6     OT     D39E
7     AD     D39E

I want the dataframe to look like this:

df2 = pd.DataFrame({'Status':['CO','AD','OT'],'H157Y':[1,0,0],'R47H':[0,2,0],'R67H':[0,1,0],
                    'R62H':[0,0,1],'D87N':[1,0,0],'D39E':[1,0,1]})
print(df2)

  Status  H157Y  R47H  R67H  R62H  D87N  D39E
0     CO      1     0     0     0     1     1
1     AD      0     2     1     0     0     0
2     OT      0     0     0     1     0     1

Where mutations are the column names and their values - the number of hits - corresponds to the status.


Solution

  • This should do the trick:

    df.groupby(['Status', 'Mutation']).size().unstack(fill_value=0)