How can I transfer a pandas dataframe with multi index columns to R? preferably, using a CSV file?
import pandas as pd
df = pd.DataFrame({'foo':[1,1,2], 'baz':['a', 'c', 'b'], 'bar':[0.2, 0.6, 0.9], 'x':[4, 0.6, 0.9]})
display(df)
df = df.groupby(['foo', 'baz']).describe()
display(df)
df.to_csv("test.csv")
generates the following CSV file:
,,bar,bar,bar,bar,bar,bar,bar,bar,x,x,x,x,x,x,x,x
,,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
foo,baz,,,,,,,,,,,,,,,,
1,a,1.0,0.2,,0.2,0.2,0.2,0.2,0.2,1.0,4.0,,4.0,4.0,4.0,4.0,4.0
1,c,1.0,0.6,,0.6,0.6,0.6,0.6,0.6,1.0,0.6,,0.6,0.6,0.6,0.6,0.6
2,b,1.0,0.9,,0.9,0.9,0.9,0.9,0.9,1.0,0.9,,0.9,0.9,0.9,0.9,0.9
How can I retain this hierarchical structure? Or if not possible at least have R auto-concatenate the names to keep the information from the header
# as_index=False
df = df.groupby(['foo', 'baz'], as_index=False).describe()
# combine the column level names with a .
df.columns = df.columns.map('.'.join)
foo.count foo.mean foo.std foo.min foo.25% foo.50% foo.75% foo.max bar.count bar.mean bar.std bar.min bar.25% bar.50% bar.75% bar.max x.count x.mean x.std x.min x.25% x.50% x.75% x.max
1.0 1.0 NaN 1.0 1.0 1.0 1.0 1.0 1.0 0.2 NaN 0.2 0.2 0.2 0.2 0.2 1.0 4.0 NaN 4.0 4.0 4.0 4.0 4.0
1.0 1.0 NaN 1.0 1.0 1.0 1.0 1.0 1.0 0.6 NaN 0.6 0.6 0.6 0.6 0.6 1.0 0.6 NaN 0.6 0.6 0.6 0.6 0.6
1.0 2.0 NaN 2.0 2.0 2.0 2.0 2.0 1.0 0.9 NaN 0.9 0.9 0.9 0.9 0.9 1.0 0.9 NaN 0.9 0.9 0.9 0.9 0.9
# save to csv
df.to_csv('test.csv', index=False)