Search code examples
pythonpandassummarize

How can I summarize the following pandas data frame (df1) to obtain (df2)


I am looking to summarize the following table, df1, pandas data frame:

  name subname  value1  value2  value3
0    a       b       1       1       1
1    a       b       2       2       2
2    a       b       3       3       3
3    a       b       4       4       4
4    a       b       5       5       5

to obtain in df2 format:

  name subname    type  mean  sd
0    a       b  value1     3   3
1    a       b  value2     3   3
2    a       b  value3     3   3

Reproducible input:

df1 = pd.DataFrame({'name': ['a','a','a','a','a'],
                    'subname': ['b','b','b','b','b'],
                    'value1': [1,2,3,4,5],
                    'value2': [1,2,3,4,5],
                    'value3': [1,2,3,4,5]})

df2 = pd.DataFrame({'name': ['a','a','a'],
                    'subname': ['b','b','b'],
                    'type': ['value1','value2','value3'],
                    'mean': [3,3,3],
                    'sd': [3,3,3]})

Solution

  • You can melt and groupby.agg:

    out = (df1.melt(['name', 'subname'], var_name='type')
              .groupby(['name', 'subname', 'type'], as_index=False)['value']
              .agg(['mean', 'std'])
           )
    

    Alternatively, with a pivot_table and stack:

    out = (df1.pivot_table(index=['name', 'subname'], aggfunc=['mean', 'std'])
              .rename_axis(columns=[None, 'type']).stack().reset_index()
          )
    

    Output:

      name subname    type  mean       std
    0    a       b  value1   3.0  1.581139
    1    a       b  value2   3.0  1.581139
    2    a       b  value3   3.0  1.581139