Search code examples
rpercentagecategorical-data

How do I utilize two catergorical columns, to create one percentage column in R?


I have a df with two categorical variables: team and home_win. I would like to obtain percentage of home wins per team (1 = home_win; 2 = home_loss). However, I can't figure out how to use two catergorical variables two create a percentage.

Please help!

team home_win total_games
"red" 1 3
"blue 1 1
"orange" 2 1
"red" 1 3
"red" 2 3
  data.frame(
    team = c("red", "blue", "orange", "red", "red"),
    home_win = c(1, 1, 2, 1, 2),
    total_games = c(3, 1, 1, 3, 3)
  )
team home_win total_games percentage
"red" 1 3 66.66
"blue 1 1 100
"orange" 2 1 0
"red" 1 3 66.66
"red" 2 3 66.66

Solution

  • You could try this. If we exchange the 2 for 0, we can simply take the mean for each team.

    library(dplyr)
    dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
    > dat %>% group_by(team) %>% summarise(win_perc = mean(home_win) * 100)
    
    # A tibble: 3 × 2
      team   win_perc
      <chr>     <dbl>
    1 blue      100  
    2 orange      0  
    3 red        66.7
    
    

    Or if you want to keep the other cols:

    dat$home_win = as.numeric(gsub(2, 0, dat$home_win))
    dat %>% group_by(team) %>% mutate(win_perc = mean(home_win) * 100)
    
    # A tibble: 5 × 4
    # Groups:   team [3]
      team   home_win total_games win_perc
      <chr>     <dbl>       <dbl>    <dbl>
    1 red           1           3     66.7
    2 blue          1           1    100  
    3 orange        0           1      0  
    4 red           1           3     66.7
    5 red           0           3     66.7