Search code examples
pythonpivot-tabledata-transform

Reshaping a Dataframe with a column having numeric and non-numeric value stored as Object Datatype


I want to reshape the input dataframe to output dataframe shape as mentioned below.

Input Dataframe

    ID  Parameter   Value
0   1001    Name            Peter
1   1001    Name            Pete
2   1001    Name            J. Pete
3   1001    ShoeSize    A
4   1001    ShoeSize    A
5   1001    BrainSize   32
6   1001    BrainSize   41
7   1002    Name            Frank
8   1002    ShoeSize    C
9   1002    ShoeSize    A
10  1002    BrainSize   52
11  1002    BrainSize   41

Output Dataframe

ID  BrainSize   Name    ShoeSize    
1001    36.5            Peter   A
1002    46.5            Frank   C,A

I did

import pandas as pd
import numpy as np
df=pd.read_csv('input.csv')
df
    ID  Parameter   Value
0   1001    Name    Peter
1   1001    Name    Pete
2   1001    Name    J. Pete
3   1001    ShoeSize    A
4   1001    ShoeSize    A
5   1001    BrainSize   32
6   1001    BrainSize   41
7   1002    Name    Frank
8   1002    ShoeSize    C
9   1002    ShoeSize    A
10  1002    BrainSize   52
11  1002    BrainSize   41
df1=pd.pivot_table(df,index='ID',columns='Parameter',values='Value',aggfunc=sum)
df1
Parameter   BrainSize   Name    ShoeSize
ID          
1001    3241    PeterPeteJ. Pete    AA
1002    5241    Frank   CA
for i in range (len(df1)):
    x=pd.to_numeric(df1.iloc[i,df1.columns.get_loc("BrainSize")][:2])+pd.to_numeric(df1.iloc[i,df1.columns.get_loc("BrainSize")][2:])
    df1.iloc[i,0]=x/2
    y=df1.iloc[i,df1.columns.get_loc("Name")][:5]
    df1.iloc[i,1]=y
    
for j in range(len(df1)):
    if(df1.iloc[j,df1.columns.get_loc('ShoeSize')][0]==df1.iloc[j,df1.columns.get_loc('ShoeSize')][1]):
        df1.iloc[j,2]=df1.iloc[j,df1.columns.get_loc('ShoeSize')][0]
    else:
        df1.iloc[j,2]=df1.iloc[j,df1.columns.get_loc('ShoeSize')][0]+','+df1.iloc[j,df1.columns.get_loc('ShoeSize')][1]
df1

Parameter   BrainSize   Name    ShoeSize
ID          
1001    36.5    Peter   A
1002    46.5    Frank   C,A

I am sure there are better ways to do it. Can you please help !!!


Solution

  • Use:

    join = lambda x: ','.join(x.dropna())
    (df.assign(idx2=lambda d: d.groupby(['ID', 'Parameter']).cumcount())
       .pivot(index=['ID', 'idx2'], columns='Parameter', values='Value')
       .astype({'BrainSize': float})
       .groupby(level=0).agg({'BrainSize': 'mean', 'Name': join, 'ShoeSize': join})
    )
    

    output:

    Parameter  BrainSize                Name ShoeSize
    ID                                               
    1001            36.5  Peter,Pete,J. Pete      A,A
    1002            46.5               Frank      C,A