I have a df with two categorical variables: team and home_win. I would like to obtain percentage of home wins per team (1 = home_win; 2 = home_loss). However, I can't figure out how to use two catergorical variables two create a percentage.
Please help!
team | home_win | total_games |
---|---|---|
"red" | 1 | 3 |
"blue | 1 | 1 |
"orange" | 2 | 1 |
"red" | 1 | 3 |
"red" | 2 | 3 |
data.frame(
team = c("red", "blue", "orange", "red", "red"),
home_win = c(1, 1, 2, 1, 2),
total_games = c(3, 1, 1, 3, 3)
)
team | home_win | total_games | percentage |
---|---|---|---|
"red" | 1 | 3 | 66.66 |
"blue | 1 | 1 | 100 |
"orange" | 2 | 1 | 0 |
"red" | 1 | 3 | 66.66 |
"red" | 2 | 3 | 66.66 |
You could try this. If we exchange the 2
for 0
, we can simply take the mean for each team.
library(dplyr)
dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
> dat %>% group_by(team) %>% summarise(win_perc = mean(home_win) * 100)
# A tibble: 3 × 2
team win_perc
<chr> <dbl>
1 blue 100
2 orange 0
3 red 66.7
Or if you want to keep the other cols:
dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
dat %>% group_by(team) %>% mutate(win_perc = mean(home_win) * 100)
# A tibble: 5 × 4
# Groups: team [3]
team home_win total_games win_perc
<chr> <dbl> <dbl> <dbl>
1 red 1 3 66.7
2 blue 1 1 100
3 orange 0 1 0
4 red 1 3 66.7
5 red 0 3 66.7