I am looking to summarize the following table, df1, pandas data frame:
name subname value1 value2 value3
0 a b 1 1 1
1 a b 2 2 2
2 a b 3 3 3
3 a b 4 4 4
4 a b 5 5 5
to obtain in df2 format:
name subname type mean sd
0 a b value1 3 3
1 a b value2 3 3
2 a b value3 3 3
Reproducible input:
df1 = pd.DataFrame({'name': ['a','a','a','a','a'],
'subname': ['b','b','b','b','b'],
'value1': [1,2,3,4,5],
'value2': [1,2,3,4,5],
'value3': [1,2,3,4,5]})
df2 = pd.DataFrame({'name': ['a','a','a'],
'subname': ['b','b','b'],
'type': ['value1','value2','value3'],
'mean': [3,3,3],
'sd': [3,3,3]})
You can melt
and groupby.agg
:
out = (df1.melt(['name', 'subname'], var_name='type')
.groupby(['name', 'subname', 'type'], as_index=False)['value']
.agg(['mean', 'std'])
)
Alternatively, with a pivot_table
and stack
:
out = (df1.pivot_table(index=['name', 'subname'], aggfunc=['mean', 'std'])
.rename_axis(columns=[None, 'type']).stack().reset_index()
)
Output:
name subname type mean std
0 a b value1 3.0 1.581139
1 a b value2 3.0 1.581139
2 a b value3 3.0 1.581139