I am new to pandas. I am trying to split a single column to multiple columns based on index value using Groupby. Below is the program wrote.
import pandas as pd
data = [(0,1.1),
(1,1.2),
(2,1.3),
(0,2.1),
(1,2.2),
(0,3.1),
(1,3.2),
(2,3.3),
(3,3.4)]
df = pd.DataFrame(data, columns=['ID','test_data'])
df = df.groupby('ID',sort=True).apply(lambda g: pd.Series(g['test_data'].values))
print(df)
df=df.unstack(level=-1).rename(columns=lambda x: 'test_data%s' %x)
print(df)
I have to use unstack(level=-1) because when we have uneven column size, the groupie and series stores the result as shown below.
ID
0 0 1.1
1 2.1
2 3.1
1 0 1.2
1 2.2
2 3.2
2 0 1.3
1 3.3
3 0 3.4
dtype: float64
End result I am getting after unstack is like below
test_data0 test_data1 test_data2
ID
0 1.1 2.1 3.1
1 1.2 2.2 3.2
2 1.3 3.3 NaN
3 3.4 NaN NaN
but what I am expecting is
test_data0 test_data1 test_data2
ID
0 1.1 2.1 3.1
1 1.2 2.2 3.2
2 1.3 NAN 3.3
3 NAN NAN 3.4
Let me know if there is any better way to do this other than groupby.
This will work if your dataframe is sorted as you show
df['num_zeros_seen'] = df['ID'].eq(0).cumsum()
#reshape the table
df = df.pivot(
index='ID',
columns='num_zeros_seen',
values='test_data',
)
print(df)
Output:
num_zeros_seen 1 2 3
ID
0 1.1 2.1 3.1
1 1.2 2.2 3.2
2 1.3 NaN 3.3
3 NaN NaN 3.4