I am an experienced SAS programmer and am converting to Python/Pandas. I frequently use PROC SUMMARY in my work in SAS to create summarized data files that I can subsequently use to combine with other files at later steps in SAS programs. The PROC SUMMARY procedure in SAS is very powerful, easy to use, and straight forward to code. I have not yet found a comparable method in Pandas that is as powerful, easy to use, and straight forward to code. As I am new to Python/Pandas, I was wondering if there is a method that does this.
This will create a simple output file with 9 columns for every unique combination of age_category and gender.
proc summary data='input file' nway;
class age_category gender;
var weight_kg height_cm;
output out='output file'
mean(weight_kg) = weight_avge
max(weight_kg) = weight_max
min(weight_kg) = weight_min
mean(height_cm) = height_avge
max(height_cm) = height_max
min(height_cm) = height_min
n(height_cm) = n_of_cases
;
run;
I am trying to do the same thing in Pandas with the summarized data being output to a data frame.
In Python, first group by age_category gender, aggregate by statistical functions, such as:
dt=df.groupby(['age','gender']).agg(['mean','max','min','count'])