R: Identify highest ranking observation in groups of unequal size

The following code generates 2-level groups (by State within Test), then ranks each observation within each group based on ascending order of Grade. School is the tie-breaker.

School<-rep(c("A","B","C","D"),each=10)
State<-rep(c("NY","NJ"),times=20)
Test<-rep(c("LSAT", "MCAT", "GRE","TOEFL","ACT"), times=8)
Grade<-trunc(rep((seq(from=500, to=600,length.out=4))))
dat<-data.frame(Test,State,School,Grade)
library(plyr)
dat<-ddply(dat, .(Test, State),transform,num=rank(Grade,ties.method="first"))

I convert the first-ranked item within each group to "lowest" using the following code:

dat$num[dat$num==1]<-"lowest"

In this sample df, the number of items per group is always 4, so I can convert the highest-ranked item in each group to "highest" using the following code:

dat$num[dat$num==4]<-"highest"

But how can I tag observations with "highest" when the number of rows is not constant across all groups? The following code creates a version of the df with two additional rows in one of the groups.

School<-rep(c("A","B","C","D"),each=10)
State<-rep(c("NY","NJ"),times=20)
Test<-rep(c("LSAT", "MCAT", "GRE","TOEFL","ACT"), times=8)
Grade<-trunc(rep((seq(from=500, to=600,length.out=4))))
dat1<-data.frame(Test,State,School,Grade) 
dat1<-rbind(dat1,
     data.frame(Test="ACT",State="NJ",School="E",Grade=550),
     data.frame(Test="ACT",State="NJ",School="F",Grade=650))
library(plyr)
dat1<-ddply(dat1, .(Test, State),transform,num=rank(Grade,ties.method="first"))

Solution

You can do that by checking which is the highest/lowest in each group and assign highest/lowest to those rows. Here, I use ddply to do that since you already use plyr in your code:

dat1 <- ddply(dat1, .(Test, State), transform, num=ifelse(num == max(num), "highest", 
                                                          ifelse(num == min(num), "lowest", num)))

> dat1
    Test State School Grade     num
1    ACT    NJ      A   533  lowest
2    ACT    NJ      B   600       4
3    ACT    NJ      C   533       2
4    ACT    NJ      D   600       5
5    ACT    NJ      E   550       3
6    ACT    NJ      F   650 highest
7    ACT    NY      A   500  lowest
8    ACT    NY      B   566       3
9    ACT    NY      C   500       2
10   ACT    NY      D   566 highest
11   GRE    NJ      A   600       3
12   GRE    NJ      B   533  lowest
13   GRE    NJ      C   600 highest
14   GRE    NJ      D   533       2
15   GRE    NY      A   566       3
16   GRE    NY      B   500  lowest
17   GRE    NY      C   566 highest
18   GRE    NY      D   500       2
19  LSAT    NJ      A   533  lowest
20  LSAT    NJ      B   600       3
21  LSAT    NJ      C   533       2
22  LSAT    NJ      D   600 highest
23  LSAT    NY      A   500  lowest
24  LSAT    NY      B   566       3
25  LSAT    NY      C   500       2
26  LSAT    NY      D   566 highest
27  MCAT    NJ      A   533  lowest
28  MCAT    NJ      B   600       3
29  MCAT    NJ      C   533       2
30  MCAT    NJ      D   600 highest
31  MCAT    NY      A   566       3
32  MCAT    NY      B   500  lowest
33  MCAT    NY      C   566 highest
34  MCAT    NY      D   500       2
35 TOEFL    NJ      A   600       3
36 TOEFL    NJ      B   533  lowest
37 TOEFL    NJ      C   600 highest
38 TOEFL    NJ      D   533       2
39 TOEFL    NY      A   500  lowest
40 TOEFL    NY      B   566       3
41 TOEFL    NY      C   500       2
42 TOEFL    NY      D   566 highest

If your data is large enough, you could also consider using dplyr or data.table which will be faster than plyr.