Search code examples
macrossasrecode

How to recode variables in table 1 using info from table 2 (in SAS)


The overal goal is to stratify quantitative variables based on their percentile. I would like to break it up into 10 levels (e.g. 10th, 20th, ...100th percentile) and recode it as 1 if it falls into the 10th percentile, 2 if it falls into the 20th percentile, etc. This method needs to be applicable across any data set I plug in and I want this process to be as automated as possible. Below I have generated some test data:

data test (drop=i);
do i=1 to 1000;
a=round(uniform(1)*4,.01);
b=round(uniform(1)*10,.01);
c=round(uniform(1)*7.5,.01);
output;
end;
stop;
run;

The following macro is used to create a table of values that tells you the cut off for the 10 percentiles of each variable. I have added a picture of the example output below the code.

/*Recode variables based on quartiles from boxplot*/
%macro percentiles(var);                                                                                                           
     /* Count the number of values in the strinrecode */                                                                                                                                   
     %let count=%sysfunc(countw(&var)); 
     /* Loop throurecodeh the total number of values */                                                                                         
     %do i = 1 %to &count;                                                                                                              
      %let variables=%qscan(&var,&i,%str(,));
proc univariate data=test noprint;
   var &variables;
   output out=pcts pctlpts  = 10 20 30 40 50 60 70 80 90 100
                    pctlpre  = &variables;
run;
proc transpose data=pcts out=&variables (rename=(col1=&variables) drop=_NAME_ _LABEL_);
run;                                                                                                                      
     %end; 
data percentiles (drop=i);
do i=1 to 10;
recode=i;
percentile=i*10;
output;
end;
stop;
run;

data pcts;
merge percentiles %sysfunc(tranwrd(&var.,%str(,),%str( ))); 
run;
%mend;  
%percentiles(%str(a,b,c)); 

output from above macro

The following code is how I am currently recoding my variables. I use the table generated in the above macro to fill in the cut off points for each percentile for each variable. As you can see, this is very tedious and will become prohibitive if I have a large number of variables to recode. Is there a better process for this or preferably a way I could automate this part?

data test;
set test;
if a <= .415 then recode_a = 1; else if a <= .785 then recode_a = 2; else if a <= 1.255 then recode_a = 3; 
else if a <= 1.61 then recode_a = 4;   else if a <= 2.03 then recode_a = 5; else if a <= 2.42 then recode_a = 6;   
else if a <= 2.76 then recode_a = 7; else if a <= 3.18 then recode_a = 8; else if a <= 3.64 then recode_a = 9; 
else if a <= 3.99 then recode_a = 10;   
if b <= .845 then recode_b = 1; else if b <= 1.88 then recode_b = 2; else if b <= 2.86 then recode_b = 3; 
else if b <= 4.005 then recode_b = 4;   else if b <= 5.03 then recode_b = 5; else if b <= 6.07 then recode_b = 6;   
else if b <= 6.995 then recode_b = 7; else if b <= 8.035 then recode_b = 8; else if b <= 9.16 then recode_b = 9; 
else if b <= 10 then recode_b = 10;  
if c <= .86 then recode_c = 1; else if c <= 1.58 then recode_c = 2; else if c <= 2.34 then recode_c = 3; 
else if c <= 3.15 then recode_c = 4;   else if c <= 3.85 then recode_c = 5; else if c <= 4.615 then recode_c = 6;   
else if c <= 5.315 then recode_c = 7; else if c <= 5.96 then recode_c = 8; else if c <= 6.75 then recode_c = 9; 
else if c <= 7.5 then recode_c = 10;
run; 

proc print data=test (obs=5);
run;

sample of desired output


Solution

  • A different option - PROC RANK. You could probably make it more 'automated' but it's pretty straightforward. Using PROC RANK you could also specify different ways of dealing with ties. Note that it would go from 0 to 9 rather than 1 to 10 but that's trivial to change.

    data test (drop=i);
    do i=1 to 1000;
    a=round(uniform(1)*4,.01);
    b=round(uniform(1)*10,.01);
    c=round(uniform(1)*7.5,.01);
    output;
    end;
    stop;
    run;
    
    proc rank data=test out=want groups=10;
    var a b c;
    ranks rankA rankB rankC;
    run;