I have a dataset which has the following relavant variables, along with many others:
data have;
input group $ score weight;
datalines;
A 12 1.2
A 19 1
A 23 1.7
A 40 2
A 11 1
A 16 1.5
A 34 1.3
A 45 1
A 28 1.8
A 30 1.2
A 24 1.1
A 28 1
B 31 1.5
B 32 1.6
B 36 1.9
B 18 2
B 28 1.4
B 25 1.3
B 22 1.8
B 27 1.6
B 31 1.5
B 42 1.7
;
run;
I am used to do ranked groups in SAS using the proc rank statement, for example:
proc rank data=have groups=4 out=want;
by group;
var score;
ranks score_quartiles;
run;
But PROC RANK does not handle weights, and for what I researched, I need to use proc univariate, transpose the results, erase duplicated ties, and merge back to the original dataset. But I am not being able to figure out how to do so.
Any help is really appreciated.
Rather than trying to generate a format why not just generate some IF/THEN logic from the output of PROC UNIVARIATE.
So first use PROC UNIVARIATE with the WEIGHT statement to generate the cutpoints you want to use for your rankings.
proc univariate data=have noprint;
by group;
var score;
weight weight;
output out=wide pctlpre=p_ pctlpts=25 to 75 by 25;
run;
proc transpose data=wide out=tall name=percentile ;
by group;
var p_: ;
run;
Then use those to generate IF/THEN logic to assign RANK based on GROUP and SCORE.
filename code temp;
data _null_;
set tall;
by group;
file code;
if first.group then do;
rank=0;
put 'if ' group= :$quote. 'then do;'
/ ' if missing(score) then rank=.; '
;
end;
rank+1;
put ' else if score < ' col1 'then ' rank= ';' ;
if last.group then do;
rank+1;
put ' else ' rank= ';' / 'end;' ;
end;
run;
Then run the code
data want;
set have;
%include code / source2;
run;
For your example data that leads to this data step:
833 data want;
834 set have;
835 %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file ...
836 +if group="A" then do;
837 + if missing(score) then rank=.;
838 + else if score < 19 then rank=1 ;
839 + else if score < 28 then rank=2 ;
840 + else if score < 34 then rank=3 ;
841 + else rank=4 ;
842 +end;
843 +if group="B" then do;
844 + if missing(score) then rank=.;
845 + else if score < 25 then rank=1 ;
846 + else if score < 31 then rank=2 ;
847 + else if score < 32 then rank=3 ;
848 + else rank=4 ;
849 +end;
NOTE: %INCLUDE (level 1) ending.
850 run;
NOTE: There were 22 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 22 observations and 4 variables.