Good afternoon, I would like to drop these observations from my waterfall, however only the first observation is shown in the output. I would like to drop all observations in SID and PID, which is why I chose the ridiculous ratio.
Please advise. Thank you!
Also, how do I allow for more room in the drop_condition observation column in the output box. Sometimes it cuts of my text. Thanks!
data temp;
set mydata.ames_housing_data;
format drop_condition $30.;
if (SID in (0:10000000)) then drop_condition = '01: SID';
else if (PID in (0:10000000)) then drop_condition = '02: PID';
else if (Neighborhood) then drop_condition = '03: Neighborhood';
else if (Zoning in ('A', 'C', 'FV', 'I')) then drop_condition = '04: Non-Residential Zoning';
else drop_condition = '05: Sample Population';
run;
proc freq data=temp;
tables drop_condition;
title 'Sample Waterfall';
run; quit;
Log
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 data temp;
57 set mydata.ames_housing_data;
NOTE: Data file MYDATA.AMES_HOUSING_DATA.DATA is in a format that is native to another host, or the file encoding does not match
the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might
reduce performance.
58 format drop condition $30.;
59
60 if (SID in (0:10000000)) then drop_condition = '01: SID';
61 else if (PID in (0:10000000)) then drop_condition = '02: PID';
62 else if (Neighborhood) then drop_condition = '03: Neighborhood';
63 else if (Zoning in ('A', 'C', 'FV', 'I')) then drop_condition = '04: Non Residential Zoning';
64 else drop_condition = '05: Sample Population';
65
66 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
62:10
NOTE: Variable drop is uninitialized.
NOTE: Variable condition is uninitialized.
NOTE: There were 2930 observations read from the data set MYDATA.AMES_HOUSING_DATA.
NOTE: The data set WORK.TEMP has 2930 observations and 85 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.06 seconds
67
68 proc freq data=temp;
69 tables drop_condition;
70 title 'Sample Waterfall';
71 run;
NOTE: There were 2930 observations read from the data set WORK.TEMP.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds
71 ! quit;
72
73 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
85
Since you haven't use a missing in your PROC FREQ it means that all your data falls into your first condition. IF conditions with IF/ELSE IF are evaluated in order and stops at the first true condition.
You can check the distribution of your variable with a proc freq or proc means.
proc means data=MYDATA.AMES_HOUSING_DATA min max;
var SID;
run;
EDIT: I think your third condition would work as is, but to avoid the Note in the log and for better code I would suggest using:
Else if not missing(neighbourhood) then...