python-3.x pandas dataframe pandas-groupby series

python pandas data frame: single column to multiple columns based on values

I am new to pandas. I am trying to split a single column to multiple columns based on index value using Groupby. Below is the program wrote.

import pandas as pd
data = [(0,1.1),
        (1,1.2),
        (2,1.3),
        (0,2.1),
        (1,2.2),
        (0,3.1),
        (1,3.2),
        (2,3.3),
        (3,3.4)]
df = pd.DataFrame(data, columns=['ID','test_data'])
df = df.groupby('ID',sort=True).apply(lambda g: pd.Series(g['test_data'].values))

print(df)
df=df.unstack(level=-1).rename(columns=lambda x: 'test_data%s' %x)
print(df)

I have to use unstack(level=-1) because when we have uneven column size, the groupie and series stores the result as shown below.

ID   
0   0    1.1
    1    2.1
    2    3.1
1   0    1.2
    1    2.2
    2    3.2
2   0    1.3
    1    3.3
3   0    3.4
dtype: float64

End result I am getting after unstack is like below

    test_data0  test_data1  test_data2
ID                                    
0          1.1         2.1         3.1
1          1.2         2.2         3.2
2          1.3         3.3         NaN
3          3.4         NaN         NaN

but what I am expecting is

   test_data0 test_data1  test_data2
ID                                  
0         1.1        2.1           3.1
1         1.2        2.2           3.2
2         1.3        NAN           3.3
3         NAN        NAN           3.4

Let me know if there is any better way to do this other than groupby.

Solution

This will work if your dataframe is sorted as you show

df['num_zeros_seen'] = df['ID'].eq(0).cumsum()

#reshape the table
df = df.pivot(
    index='ID',
    columns='num_zeros_seen',
    values='test_data',
)

print(df)

Output:

num_zeros_seen    1    2    3
ID                           
0               1.1  2.1  3.1
1               1.2  2.2  3.2
2               1.3  NaN  3.3
3               NaN  NaN  3.4