I have a table with very many columns but for the in order to explain my problem I will use this simple table.
data test;
input a b c;
datalines;
0 0 0
1 1 1
. 4 2
;
run;
I need to calculate the common summary statistic as min, max and number of missing. But I also need to calculate some special numbers as number of values above a certain level( in this example >0 and >1.
I can use proc mean but it only give me results for normal things like min, max etc.
What I want is result on the following format:
var minval maxval nmiss n_above1 n_above2
a 0 1 1 1 0
b 0 4 0 2 1
c 0 2 0 2 1
I have been able to make this informat for one variable with this rather stupid code:
data result;
set test(keep =b) end=last;
variable = 'b';
retain minval maxval;
if _n_ = 1 then do;
minval = 1e50;
maxval = -1e50;
end;
if minval > b then minval = b;
if maxval < b then maxval = b;
if b=. then nmiss+1;
if b>0 then n_above1+1;
if b>2 then n_above2+1;
if last then do;
output;
end;
drop b;
run;
This produce the following table:
variable minval maxval nmiss n_above1 n_above2
b 0 4 0 2 1
I know there has to be better way do this. I am used to Python and Pandas. There I will only loop through each variable, calculate the different summary statistick and append the result to a new dataframe for each variable.
I can probably also use proc sql. The next example
proc sql;
create table res as
select count(case when a > 0 then 1 end) as n_above1_a,
count(case when b > 0 then 1 end) as n_above1_b,
count(case when c > 0 then 1 end) as n_above1_c
from test;
quit;
This gives me:
n_above1_a n_above1_b n_above1_c
1 2 2
But this do not solve my problem.
If you add an unique identifier to each row then you can just use PROC TRANSPOSE and PROC SQL to get your result.
data test;
input a b c;
id+1;
datalines;
0 0 0
1 1 1
. 4 2
;
proc transpose data=test out=tall ;
by id ;
run;
proc sql noprint ;
create table want as
select _name_
, min(col1) as minval
, max(col1) as maxval
, sum(missing(col1)) as nmiss
, sum(col1>1) as n_above1
, sum(col1>2) as n_above2
from tall
group by _name_
;
quit;
Result
Obs _NAME_ minval maxval nmiss n_above1 n_above2
1 a 0 1 1 0 0
2 b 0 4 0 1 1
3 c 0 2 0 1 0