I've a two variable data frame grouped by Shape and would like to extract the first n number of rows (n is different for each level of the grouping variable) for each group. I tried some of the dplyr and data.table functions but they seem to work only for equal number of rows.
Data <- data.frame(Shape = c("R", "R", "R", "C", "C", "T", "T", "T", "T"), Area = c(35, 30, 25, 32, 28, 40, 35, 33, 31))
I would like to get the first 2 Rs, the first C and the first 3 Ts. The expected outcome:
Out <- data.frame(Shape = c("R", "R", "C", "T", "T", "T"), Area = c(35, 30, 32, 40, 35, 33))
We can do a group_split
into a list
of data.frame
with the 'Shape' column and then pass the restriction 'n' in map2
to filter
the number of rows accordingly
library(dplyr)
library(purrr)
Data %>%
group_split(Shape = factor(Shape, levels = unique(Shape))) %>%
map2_dfr(., c(2, 1, 3), ~ .x %>%
filter(row_number() <= .y))
# A tibble: 6 x 2
# Shape Area
#* <fct> <dbl>
#1 R 35
#2 R 30
#3 C 32
#4 T 40
#5 T 35
#6 T 33
Or another option is to have a column 'n' by passing a named vector and then grouped by 'Shape' do the filter
Data %>%
mutate(n = setNames(c(2, 1, 3), unique(Shape))[as.character(Shape)]) %>%
group_by(Shape) %>%
filter(row_number() <= n[1]) %>%
select(-n)