I'm working with NHL player data and I basically want to compare a select Players points to the rest of the population. So I have the player data which looks like this:
Player Season Team Position GP TOI G A P P1 `P/60`
<chr> <int> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
Aaron~ 2019 FLA D 35 603. 3 2 5 3 0.5
Adam ~ 2019 CBJ D 4 35.5 0 0 0 0 0
Adam ~ 2019 T.B L 23 218. 2 7 9 5 2.48
and so on for the rest of the league. I'd like to compare a summary statistic between one of the observations to the rest of the data set.
Player Season Team Position Summary Statistic
<chr> <int> <chr> <chr> <int>
Kasperi 2019 FLA D 45
"Others" 2019 CBJ D 53
I've seen fct_lump used to select the top records, sorted on some count - but when I tried something similar to using the Player names I couldn't get it to work.
NHL %>%
mutate(Player = fct_lump(Player,
Kasperi Kapanen = "Kasperi Kapanen",
other = !("Kasperi Kapanen")))
fct_lump
is not appropriate to handle the flexibility you want. you should use dplyr's if_else
for one against all other observations
library(dplyr)
NHL %>%
mutate(Player = if_else(Player == "Kasperi Kapanen", "Kasperi Kapanen",
"others"))
OR case_when
for multiple ifelse comparisons.
NHL %>%
mutate(Player = case_when(
Player == "Kasperi Kapanen" ~ "Kasperi Kapanen",
Player == "Adam" ~ "Adam",
TRUE ~ "others"
))