Search code examples
sasreportproc

SAS PROC REPORT Conditional Output in a Column


I am trying to learn SAS and specifically PROC REPORT. I am using SASHELP.CARS dataset.

What I want to achieve in the 6th column of the output, labelled as 'Number of Cars > Mean(Invoice)' to compute number of cars whose Invoice is greater than the Group's mean of Invoice. I am using the code below.

PROC REPORT DATA=sashelp.CARS NOWD OUT=learning.MyFirstReport;
COLUMNS Type Origin INVOICE=Max_INVOICE INVOICE=Mean_Invoice 
INVOICE=Count_Invoice TEST DriveTrain;
DEFINE Type / Group 'Type of Car' CENTER;
DEFINE Origin / Group 'Origin of Car' CENTER;
DEFINE Max_Invoice / ANALYSIS MAX 'Max of Invoice';
DEFINE Mean_Invoice / ANALYSIS MEAN 'Mean of Invoice';
DEFINE Count_Invoice / ANALYSIS N FORMAT=5.0 'Total Number of Cars' center;
DEFINE DriveTrain / ACROSS 'Type of DriveTrain of Car';
DEFINE TEST / COMPUTED 'Number of Cars > Mean(Invoice)' center;
COMPUTE TEST;
     TEST=N(_c7_>Mean_Invoice);
ENDCOMP;
RUN;

The Output that I am getting is in the image below.

Output of the above SAS code

I don't think that is the correct output since all the rows in the column show a value of 1. How do I get the desired output in the 6th column of the output?


Solution

  • The non group columns are being defined analysis for computing aggregate statistics. One way to achieve a count of a logical evaluation is to prep the data so that a SUM aggregation of an individual flag (0 or 1) is the count of positive assertions.

    Prepare

    proc sql;
      create view cars_v as
      select *
      , mean(invoice) as invoice_mean_over_type_origin
      , (invoice > calculated invoice_mean_over_type_origin) as flag_50
      from sashelp.cars
      group by type, origin
      ;
    

    Report

    PROC REPORT DATA=CARS_V OUT=work.MyFirstReport;
    COLUMNS 
      Type 
      Origin 
      INVOICE/*=Max_INVOICE */
      INVOICE=INVOICE_use_2/*=Mean_Invoice */
    
      flag_50
      flag_50=flag_50_use_2
    
      flag_50_other
      DriveTrain
    ;
      DEFINE Type / Group 'Type of Car' CENTER;
      DEFINE Origin / Group 'Origin of Car' CENTER;
      DEFINE Invoice / ANALYSIS MAX 'Max of Invoice';
      DEFINE Invoice_use_2 / ANALYSIS MEAN 'Mean of Invoice';
    
      DEFINE flag_50 / analysis sum  'Number of Cars > Mean ( Invoice )' center;
    
      DEFINE flag_50_use_2 / noprint analysis N  ;
      * noprint makes a hidden column whose value is available to compute blocks;
    
      DEFINE flag_50_other / computed 'Number of Cars <= Mean ( Invoice )' center;
      DEFINE DriveTrain / ACROSS 'Type of DriveTrain of Car';
    
      compute flag_50_other;
        flag_50_other = flag_50_use_2 - flag_50.sum;
      endcomp;
    RUN;
    
    • In newer versions of SAS NOWD is a default option. New Proc REPORT code does not need to specified it explicitly.
    • Reusing a variable such as invoice=mean_invoice is ok, but a future reader of the code might have some misunderstanding when seeing the DEFINE Mean_Invoice / ANALYSIS MEAN 'Mean of Invoice'; line of code -- is the define for the mean or the mean of a mean ?