I have data like this:
data mydata;
input ID $ Val $ Date;
datalines;
1 A 2010-12-01
1 B 2010-12-03
1 A 2010-12-04
1 B 2010-12-08
2 X 2009-10-01
2 X 2009-10-02
2 Z 2009-10-03
;
run;
I would like the mode returned where it exists. ID 1, however, doesn't have a true mode. In the case of ties where modes do not exist I would like the most recent Val to break the tie (as in id 1).
Desired OUTPUT:
ID Mode
1 B
2 X
I tried proc univariate (which only handles numeric modes, another problem) but this gives the dataset with mode null; which SAS has correct but is not the desired output. I would like to do this in a datastep.
CODE:
proc univariate data=mydata noprint;
class id;
var val;
output out=modetable mode=mode;
run;
OUTPUT:
ID Mode
1
2 X
An example of this statement can be fount in Identifying the Top Three Extreme Values with the Output Statistics
Let us extend the example data a little bit;
data myInput;
infile datalines dsd delimiter='09'x;
input
@1 ID 1.
@4 Val $1.
@7 Date yymmdd10.;
format Date yymmdd10.;
datalines;
2 X 2009-10-01
2 X 2009-10-02
2 Z 2009-10-03
3 C 2010-10-01
3 B 2010-10-03
3 A 2010-10-04
3 A 2010-12-01
3 B 2010-12-03
3 C 2010-12-04
;
run;
Now let us count the frequency and the last occurence of each ´Val´ for each ´ID´;
proc sql;
create view myView as
select ID, Val, max(Date) as Date format=yymmdd10., count(*) as freq
from myInput
group by ID, Val;
run;
And finally, retain one Val for each ID, prefering the more frequent one and within equally frequent ones the most recent one;
proc means data=myView nway noprint;
class ID;
output out=myModes(keep= ID Mode)
idgroup( max(freq Date) out[1] (Val)=Mode);
run;
proc print data=myModes;
run;
The result is;
ID Mode
2 X
3 C