Search code examples
rdplyr

How to select last N observation from each group in dplyr dataframe?


Given a dataframe:

df <- structure(list(a = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4), b = c(34, 
343, 54, 11, 55, 62, 59, -9, 0, -0.5)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

I want to take last N observations / rows from each group:

df %>% 
dplyr::group_by(a) %>% 
dplyr::last(2)

Gives me wrong results.

I want it to be:

a   b
1 343
1  54
2  55
2  62
3  59
3  -9
4   0
4  -0.5

Please advise what is wrong here?

The error I get is:

Error in order(order_by)[[n]] : subscript out of bounds


Solution

  • As it is a specific question based on dplyr

    1) after the group_by, use slice on the row_number()

    library(tidyverse)
    df %>% 
       group_by(a) %>% 
       slice(tail(row_number(), 2))
    # A tibble: 8 x 2
    # Groups:   a [4]
    #      a      b
    #  <dbl>  <dbl>
    #1     1  343  
    #2     1   54  
    #3     2   55  
    #4     2   62  
    #5     3   59  
    #6     3   -9  
    #7     4    0  
    #8     4   -0.5
    

    2) Or use filter from dplyr

    df %>% 
       group_by(a) %>% 
       filter(row_number() >= (n() - 1))
    

    3) or with do and tail

    df %>%
        group_by(a) %>% 
        do(tail(., 2))
    

    4) In addition to the tidyverse, methods, we can also use compact data.table

    library(data.table)
    setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]
    

    5) Or by from base R

    by(df, df$a, FUN = tail, 2)
    

    6) or with aggregate from base R

    df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]
    

    7) or with split from base R

    do.call(rbind, lapply(split(df, df$a), tail, 2))