Search code examples
sassas-macro

SAS: Most frequent value (Like a MODE) ties solved by recency?


I have data like this:

data mydata;
input ID $ Val $ Date;
datalines;
1  A  2010-12-01 
1  B  2010-12-03 
1  A  2010-12-04 
1  B  2010-12-08
2  X  2009-10-01 
2  X  2009-10-02 
2  Z  2009-10-03 
;
run;

I would like the mode returned where it exists. ID 1, however, doesn't have a true mode. In the case of ties where modes do not exist I would like the most recent Val to break the tie (as in id 1).

Desired OUTPUT:

ID Mode
1  B  
2  X  

I tried proc univariate (which only handles numeric modes, another problem) but this gives the dataset with mode null; which SAS has correct but is not the desired output. I would like to do this in a datastep.

CODE:

proc univariate data=mydata noprint;
class id;
var val;
output out=modetable mode=mode;
run;

OUTPUT:

ID Mode
1  
2  X

Solution

  • use IDgroup from proc means

    An example of this statement can be fount in Identifying the Top Three Extreme Values with the Output Statistics

    Let us extend the example data a little bit;

    data myInput;
        infile datalines dsd delimiter='09'x;
        input 
            @1 ID 1. 
            @4 Val $1. 
            @7 Date yymmdd10.;
        format Date yymmdd10.;
        datalines;
    2  X  2009-10-01
    2  X  2009-10-02
    2  Z  2009-10-03
    3  C  2010-10-01
    3  B  2010-10-03
    3  A  2010-10-04
    3  A  2010-12-01
    3  B  2010-12-03
    3  C  2010-12-04
    ;
    run;
    

    Now let us count the frequency and the last occurence of each ´Val´ for each ´ID´;

    proc sql;
        create view myView as 
        select ID, Val, max(Date) as Date format=yymmdd10., count(*) as freq
        from myInput
        group by ID, Val;
    run;
    

    And finally, retain one Val for each ID, prefering the more frequent one and within equally frequent ones the most recent one;

    proc means data=myView nway noprint;
       class ID;
       output out=myModes(keep= ID Mode)
              idgroup( max(freq Date) out[1] (Val)=Mode);
    run;
    
    proc print data=myModes;
    run;
    

    The result is;

    ID  Mode
    2   X
    3   C