Search code examples
rdataframerowextract

Extracting unequal number of rows by group in r


I've a two variable data frame grouped by Shape and would like to extract the first n number of rows (n is different for each level of the grouping variable) for each group. I tried some of the dplyr and data.table functions but they seem to work only for equal number of rows.

Data <- data.frame(Shape = c("R", "R", "R", "C", "C", "T", "T", "T", "T"), Area = c(35, 30, 25, 32, 28, 40, 35, 33, 31))

I would like to get the first 2 Rs, the first C and the first 3 Ts. The expected outcome:

Out <- data.frame(Shape = c("R", "R", "C", "T", "T", "T"), Area = c(35, 30, 32, 40, 35, 33))

Solution

  • We can do a group_split into a list of data.frame with the 'Shape' column and then pass the restriction 'n' in map2 to filter the number of rows accordingly

    library(dplyr)
    library(purrr)
    Data %>% 
      group_split(Shape = factor(Shape, levels = unique(Shape))) %>% 
      map2_dfr(., c(2, 1, 3), ~ .x %>%
                                 filter(row_number() <= .y))
    # A tibble: 6 x 2
    #  Shape  Area
    #* <fct> <dbl>
    #1 R        35
    #2 R        30
    #3 C        32
    #4 T        40
    #5 T        35
    #6 T        33
    

    Or another option is to have a column 'n' by passing a named vector and then grouped by 'Shape' do the filter

    Data %>%
        mutate(n = setNames(c(2, 1, 3), unique(Shape))[as.character(Shape)]) %>% 
        group_by(Shape) %>%
        filter(row_number() <= n[1]) %>%
        select(-n)