I´m trying to combine the functions slice_max from dplyr and fct_other from forcats to get a top n slice of a dataframe, based in a numeric variable, but I don´t want to lose the non top n factors. I want those other factors to be designated as "Others" to summarise or count after that if I need it.
For example, with a dataframe similar to this:
df <- data.frame(acron = c("AA", "BB", "CC", "DD", "EE", "FF", "GG"), value = c(6, 4, 1, 10, 3, 1, 1))
If I want the top 3 subjetcs by their "value", I can use the next code:
df %>%
slice_max(value, n = 3)
Getting the next result:
acron value
DD 10
AA 6
BB 4
But I would like to designate to dropped "acron"s the factor "Others" similar to the results obtained using the function fct_other from forcats. I´ve tried this code but it deosn´t work:
df %>%
mutate(acron = fct_other(acron, keep = slice_max(value, n = 3), other_level = "Others"))
Any suggestion to get something like this?:
acron value
DD 10
AA 6
BB 4
Others 3
Others 1
Others 1
Others 1
Or even like this:
acron value
DD 10
AA 6
BB 4
Others 6
One option could be using fct_lump_n()
:
df %>%
mutate(acron = fct_lump_n(acron, n = 3, w = value))
acron value
1 AA 6
2 BB 4
3 Other 1
4 DD 10
5 Other 3
6 Other 1
7 Other 1