Create new column based on mean of values that are found in specific rows in R

I have table like this and would like to count genes that appear the most (lets say top 10 genes) and then find out mean of tail_len for those top 10 genes.

	gene	tail_len
1	SPAC20G4.06c	3
2	SPCC613.06	5
3	SPAC6F6.03c	2
4	SPAC20G4.06c	3
5	SPBC23G7.15c	5
6	SPAC589.10c	2
7	SPBC23G7.15c	3
8	SPAC22H12.04c	1
9	SPAC22H12.04c	12
10	SPAC6G10.11c	8
11	SPAC589.10c	31
12	SPBC18E5.06	16

Solution

Here is a way with slice_max. I have defined two variables, ties_ok and max_n. The latter is set to 3 to test the code, you want max_n <- 110, the former can bee set to FALSE if you want to discard ties and keep only the first rows found.

df1 <- "    gene    tail_len
1   SPAC20G4.06c    3
2   SPCC613.06  5
3   SPAC6F6.03c     2
4   SPAC20G4.06c    3
5   SPBC23G7.15c    5
6   SPAC589.10c     2
7   SPBC23G7.15c    3
8   SPAC22H12.04c   1
9   SPAC22H12.04c   12
10  SPAC6G10.11c    8
11  SPAC589.10c     31
12  SPBC18E5.06     16"
df1 <- read.table(text = df1, header = TRUE)

suppressPackageStartupMessages(
  library(dplyr)
)

ties_ok <- TRUE
#ties_ok <- FALSE
max_n <- 3L
df1 %>%
  group_by(gene) %>%
  summarise(count = n(), mean_tail_len = mean(tail_len)) %>%
  slice_max(count, n = max_n, with_ties = ties_ok) %>%
  select(-count)
#> # A tibble: 4 × 2
#>   gene          mean_tail_len
#>   <chr>                 <dbl>
#> 1 SPAC20G4.06c            3  
#> 2 SPAC22H12.04c           6.5
#> 3 SPAC589.10c            16.5
#> 4 SPBC23G7.15c            4

^{Created on 2023-01-20 with reprex v2.0.2}