Search code examples
rdataframer-faq

Select the row with the maximum value in each group


In a dataset with multiple observations for each subject. For each subject I want to select the row which have the maximum value of 'pt'. For example, with a following dataset:

ID    <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)

group <- data.frame(Subject=ID, pt=Value, Event=Event)
#   Subject pt Event
# 1       1  2     1
# 2       1  3     1
# 3       1  5     2 # max 'pt' for Subject 1
# 4       2  2     1
# 5       2  5     2
# 6       2  8     1
# 7       2 17     2 # max 'pt' for Subject 2
# 8       3  3     2
# 9       3  5     2 # max 'pt' for Subject 3

Subject 1, 2, and 3 have the biggest pt value of 5, 17, and 5 respectively.

How could I first find the biggest pt value for each subject, and then, put this observation in another data frame? The resulting data frame should only have the biggest pt values for each subject.


Solution

  • Here's a data.table solution:

    require(data.table) ## 1.9.2
    group <- as.data.table(group)
    

    If you want to keep all the entries corresponding to max values of pt within each group:

    group[group[, .I[pt == max(pt)], by=Subject]$V1]
    #    Subject pt Event
    # 1:       1  5     2
    # 2:       2 17     2
    # 3:       3  5     2
    

    If you'd like just the first max value of pt:

    group[group[, .I[which.max(pt)], by=Subject]$V1]
    #    Subject pt Event
    # 1:       1  5     2
    # 2:       2 17     2
    # 3:       3  5     2
    

    In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.