Search code examples
rdplyrtidyverseforcats

Is there a way in R to combine the functions slice_max (dplyr) and fct_other(forcats)?


I´m trying to combine the functions slice_max from dplyr and fct_other from forcats to get a top n slice of a dataframe, based in a numeric variable, but I don´t want to lose the non top n factors. I want those other factors to be designated as "Others" to summarise or count after that if I need it.

For example, with a dataframe similar to this:

df <- data.frame(acron = c("AA", "BB", "CC", "DD", "EE", "FF", "GG"), value = c(6, 4, 1, 10, 3, 1, 1))

If I want the top 3 subjetcs by their "value", I can use the next code:

df %>% 
  slice_max(value, n = 3)

Getting the next result:

acron value
DD 10
AA 6
BB 4

But I would like to designate to dropped "acron"s the factor "Others" similar to the results obtained using the function fct_other from forcats. I´ve tried this code but it deosn´t work:

df %>% 
  mutate(acron = fct_other(acron, keep = slice_max(value, n = 3), other_level = "Others"))

Any suggestion to get something like this?:

acron value
DD 10
AA 6
BB 4
Others 3
Others 1
Others 1
Others 1

Or even like this:

acron value
DD 10
AA 6
BB 4
Others 6


Solution

  • One option could be using fct_lump_n():

    df %>%
     mutate(acron = fct_lump_n(acron, n = 3, w = value))
    
      acron value
    1    AA     6
    2    BB     4
    3 Other     1
    4    DD    10
    5 Other     3
    6 Other     1
    7 Other     1