I want to reshape the input dataframe to output dataframe shape as mentioned below.
Input Dataframe
ID Parameter Value
0 1001 Name Peter
1 1001 Name Pete
2 1001 Name J. Pete
3 1001 ShoeSize A
4 1001 ShoeSize A
5 1001 BrainSize 32
6 1001 BrainSize 41
7 1002 Name Frank
8 1002 ShoeSize C
9 1002 ShoeSize A
10 1002 BrainSize 52
11 1002 BrainSize 41
Output Dataframe
ID BrainSize Name ShoeSize
1001 36.5 Peter A
1002 46.5 Frank C,A
I did
import pandas as pd
import numpy as np
df=pd.read_csv('input.csv')
df
ID Parameter Value
0 1001 Name Peter
1 1001 Name Pete
2 1001 Name J. Pete
3 1001 ShoeSize A
4 1001 ShoeSize A
5 1001 BrainSize 32
6 1001 BrainSize 41
7 1002 Name Frank
8 1002 ShoeSize C
9 1002 ShoeSize A
10 1002 BrainSize 52
11 1002 BrainSize 41
df1=pd.pivot_table(df,index='ID',columns='Parameter',values='Value',aggfunc=sum)
df1
Parameter BrainSize Name ShoeSize
ID
1001 3241 PeterPeteJ. Pete AA
1002 5241 Frank CA
for i in range (len(df1)):
x=pd.to_numeric(df1.iloc[i,df1.columns.get_loc("BrainSize")][:2])+pd.to_numeric(df1.iloc[i,df1.columns.get_loc("BrainSize")][2:])
df1.iloc[i,0]=x/2
y=df1.iloc[i,df1.columns.get_loc("Name")][:5]
df1.iloc[i,1]=y
for j in range(len(df1)):
if(df1.iloc[j,df1.columns.get_loc('ShoeSize')][0]==df1.iloc[j,df1.columns.get_loc('ShoeSize')][1]):
df1.iloc[j,2]=df1.iloc[j,df1.columns.get_loc('ShoeSize')][0]
else:
df1.iloc[j,2]=df1.iloc[j,df1.columns.get_loc('ShoeSize')][0]+','+df1.iloc[j,df1.columns.get_loc('ShoeSize')][1]
df1
Parameter BrainSize Name ShoeSize
ID
1001 36.5 Peter A
1002 46.5 Frank C,A
I am sure there are better ways to do it. Can you please help !!!
Use:
join = lambda x: ','.join(x.dropna())
(df.assign(idx2=lambda d: d.groupby(['ID', 'Parameter']).cumcount())
.pivot(index=['ID', 'idx2'], columns='Parameter', values='Value')
.astype({'BrainSize': float})
.groupby(level=0).agg({'BrainSize': 'mean', 'Name': join, 'ShoeSize': join})
)
output:
Parameter BrainSize Name ShoeSize
ID
1001 36.5 Peter,Pete,J. Pete A,A
1002 46.5 Frank C,A