Search code examples
pandassas

Python/Pandas equivalent of SAS Proc Summary procedure


I am an experienced SAS programmer and am converting to Python/Pandas. I frequently use PROC SUMMARY in my work in SAS to create summarized data files that I can subsequently use to combine with other files at later steps in SAS programs. The PROC SUMMARY procedure in SAS is very powerful, easy to use, and straight forward to code. I have not yet found a comparable method in Pandas that is as powerful, easy to use, and straight forward to code. As I am new to Python/Pandas, I was wondering if there is a method that does this.

This will create a simple output file with 9 columns for every unique combination of age_category and gender.

proc summary data='input file' nway;
 class age_category gender;
 var weight_kg height_cm;
 output out='output file'
   mean(weight_kg) = weight_avge
   max(weight_kg) = weight_max
   min(weight_kg) = weight_min
   mean(height_cm) = height_avge
   max(height_cm) = height_max
   min(height_cm) = height_min
   n(height_cm) = n_of_cases
  ; 
run; 

I am trying to do the same thing in Pandas with the summarized data being output to a data frame.


Solution

  • In Python, first group by age_category gender, aggregate by statistical functions, such as:

    dt=df.groupby(['age','gender']).agg(['mean','max','min','count'])