I have a pandas dataframe that has one row per object. Within that object, there are subobjects. I want to create a dataframe which contains one row per subobject.
I've read stuff on melt but can't begin to work out how to use it for what I want to do.
I want to go from
ObjectID Sub1_ID Sub1_Var1 Sub1_Var2 Sub1_Var3 Sub2_ID Sub2_Var1 Sub2_Var2 Sub2_Var3
1 98398 3 10 9 19231 6 7 5
2 87868 8 5 4
3 4579 5 6 6 24833 6 2 2
4 2514 1 6 9
to
ObjectID Sub_ID Var1 Var2 Var3
1 98398 3 10 9
1 19231 6 7 5
2 87868 8 5 4
3 4579 5 6 6
3 24833 6 2 2
4 2514 1 6 9
One way you can do this is using MultiIndex with from_arrays
and then use stack
to reshape the dataframe:
df1 = df.set_index('ObjectID')
df1.columns = pd.MultiIndex.from_arrays(zip(*df1.columns.str.split('_')))
df1.stack(0).reset_index().drop('level_1', axis=1)
Output:
ObjectID ID Var1 Var2 Var3
0 1 98398.0 3.0 10.0 9.0
1 1 19231.0 6.0 7.0 5.0
2 2 87868.0 8.0 5.0 4.0
3 3 4579.0 5.0 6.0 6.0
4 3 24833.0 6.0 2.0 2.0
5 4 2514.0 1.0 6.0 9.0