Search code examples
sassamplingwaterfall

SAS Drop Condition - Creating a Waterfall for my Sample


Good afternoon, I would like to drop these observations from my waterfall, however only the first observation is shown in the output. I would like to drop all observations in SID and PID, which is why I chose the ridiculous ratio.

Please advise. Thank you!

Also, how do I allow for more room in the drop_condition observation column in the output box. Sometimes it cuts of my text. Thanks!

data temp;
set mydata.ames_housing_data;
format drop_condition $30.;

if (SID in (0:10000000)) then drop_condition = '01: SID';
else if (PID in (0:10000000)) then drop_condition = '02: PID';
else if (Neighborhood) then drop_condition = '03: Neighborhood';
else if (Zoning in ('A', 'C', 'FV', 'I')) then drop_condition = '04: Non-Residential Zoning';
else drop_condition = '05: Sample Population';

run;

proc freq data=temp;
tables drop_condition;
title 'Sample Waterfall';
run; quit;

Output

Log

   1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
    55         
    56         data temp;
   57         set mydata.ames_housing_data;
   NOTE: Data file MYDATA.AMES_HOUSING_DATA.DATA is in a format that is        native to another host, or the file encoding does not match 
   the session encoding. Cross Environment Data Access will be used, which        might require additional CPU resources and might 
   reduce performance.
    58         format drop condition $30.;
    59         
    60         if (SID in (0:10000000)) then drop_condition = '01: SID';
    61         else if (PID in (0:10000000)) then drop_condition = '02: PID';
    62         else if (Neighborhood) then drop_condition = '03: Neighborhood';
    63         else if (Zoning in ('A', 'C', 'FV', 'I')) then drop_condition = '04: Non Residential Zoning';
    64         else drop_condition = '05: Sample Population';
    65         
    66         run;

    NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
   62:10   
    NOTE: Variable drop is uninitialized.
    NOTE: Variable condition is uninitialized.
    NOTE: There were 2930 observations read from the data set MYDATA.AMES_HOUSING_DATA.
    NOTE: The data set WORK.TEMP has 2930 observations and 85 variables.
    NOTE: DATA statement used (Total process time):
   real time           0.05 seconds
   cpu time            0.06 seconds


    67         
    68         proc freq data=temp;
    69         tables drop_condition;
    70         title 'Sample Waterfall';
    71         run;

    NOTE: There were 2930 observations read from the data set WORK.TEMP.
    NOTE: PROCEDURE FREQ used (Total process time):
   real time           0.06 seconds
   cpu time            0.06 seconds

    71       !      quit;

    72         
    73         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
    85         

Solution

  • Since you haven't use a missing in your PROC FREQ it means that all your data falls into your first condition. IF conditions with IF/ELSE IF are evaluated in order and stops at the first true condition.

    You can check the distribution of your variable with a proc freq or proc means.

    proc means data=MYDATA.AMES_HOUSING_DATA min max;
    var SID;
    run;
    

    EDIT: I think your third condition would work as is, but to avoid the Note in the log and for better code I would suggest using:

    Else if not missing(neighbourhood) then...