Search code examples
python-3.xpandasdataframepandas-groupbyseries

python pandas data frame: single column to multiple columns based on values


I am new to pandas. I am trying to split a single column to multiple columns based on index value using Groupby. Below is the program wrote.

import pandas as pd
data = [(0,1.1),
        (1,1.2),
        (2,1.3),
        (0,2.1),
        (1,2.2),
        (0,3.1),
        (1,3.2),
        (2,3.3),
        (3,3.4)]
df = pd.DataFrame(data, columns=['ID','test_data'])
df = df.groupby('ID',sort=True).apply(lambda g: pd.Series(g['test_data'].values))

print(df)
df=df.unstack(level=-1).rename(columns=lambda x: 'test_data%s' %x)
print(df)

I have to use unstack(level=-1) because when we have uneven column size, the groupie and series stores the result as shown below.

ID   
0   0    1.1
    1    2.1
    2    3.1
1   0    1.2
    1    2.2
    2    3.2
2   0    1.3
    1    3.3
3   0    3.4
dtype: float64

End result I am getting after unstack is like below

    test_data0  test_data1  test_data2
ID                                    
0          1.1         2.1         3.1
1          1.2         2.2         3.2
2          1.3         3.3         NaN
3          3.4         NaN         NaN

but what I am expecting is

   test_data0 test_data1  test_data2
ID                                  
0         1.1        2.1           3.1
1         1.2        2.2           3.2
2         1.3        NAN           3.3
3         NAN        NAN           3.4

Let me know if there is any better way to do this other than groupby.


Solution

  • This will work if your dataframe is sorted as you show

    df['num_zeros_seen'] = df['ID'].eq(0).cumsum()
    
    #reshape the table
    df = df.pivot(
        index='ID',
        columns='num_zeros_seen',
        values='test_data',
    )
    
    print(df)
    

    Output:

    num_zeros_seen    1    2    3
    ID                           
    0               1.1  2.1  3.1
    1               1.2  2.2  3.2
    2               1.3  NaN  3.3
    3               NaN  NaN  3.4