Search code examples
pythonrpandasmulti-index

transfer a pandas dataframe with multi_index columns to R


How can I transfer a pandas dataframe with multi index columns to R? preferably, using a CSV file?

import pandas as pd

df = pd.DataFrame({'foo':[1,1,2], 'baz':['a', 'c', 'b'], 'bar':[0.2, 0.6, 0.9], 'x':[4, 0.6, 0.9]})
display(df)
df = df.groupby(['foo', 'baz']).describe()
display(df)

results in: enter image description here

df.to_csv("test.csv") generates the following CSV file:

,,bar,bar,bar,bar,bar,bar,bar,bar,x,x,x,x,x,x,x,x
,,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
foo,baz,,,,,,,,,,,,,,,,
1,a,1.0,0.2,,0.2,0.2,0.2,0.2,0.2,1.0,4.0,,4.0,4.0,4.0,4.0,4.0
1,c,1.0,0.6,,0.6,0.6,0.6,0.6,0.6,1.0,0.6,,0.6,0.6,0.6,0.6,0.6
2,b,1.0,0.9,,0.9,0.9,0.9,0.9,0.9,1.0,0.9,,0.9,0.9,0.9,0.9,0.9

How can I retain this hierarchical structure? Or if not possible at least have R auto-concatenate the names to keep the information from the header


Solution

    • Combine the multi-level name, then save to csv.
    # as_index=False
    df = df.groupby(['foo', 'baz'], as_index=False).describe()
    
    # combine the column level names with a .
    df.columns = df.columns.map('.'.join)
    
     foo.count  foo.mean  foo.std  foo.min  foo.25%  foo.50%  foo.75%  foo.max  bar.count  bar.mean  bar.std  bar.min  bar.25%  bar.50%  bar.75%  bar.max  x.count  x.mean  x.std  x.min  x.25%  x.50%  x.75%  x.max
           1.0       1.0      NaN      1.0      1.0      1.0      1.0      1.0        1.0       0.2      NaN      0.2      0.2      0.2      0.2      0.2      1.0     4.0    NaN    4.0    4.0    4.0    4.0    4.0
           1.0       1.0      NaN      1.0      1.0      1.0      1.0      1.0        1.0       0.6      NaN      0.6      0.6      0.6      0.6      0.6      1.0     0.6    NaN    0.6    0.6    0.6    0.6    0.6
           1.0       2.0      NaN      2.0      2.0      2.0      2.0      2.0        1.0       0.9      NaN      0.9      0.9      0.9      0.9      0.9      1.0     0.9    NaN    0.9    0.9    0.9    0.9    0.9
    
    # save to csv
    df.to_csv('test.csv', index=False)