I'm trying to create a data frame in R that will summarise a football team's betting odds that are based in another.
For example, this very small sample table features the home team and the away team as well as the respective odds for the match.
matchData:
Key: HWO (Home Win Odds), DO (Draw Odds), AWO (Away Win Odds)
+----------+----------+------+------+------+
| HomeTeam | AwayTeam | HWO | DO | AWO |
+----------+----------+------+------+------+
| TeamA | TeamB | 1.30 | 5.20 | 9.50 |
| TeamC | TeamD | 1.59 | 4.20 | 6.30 |
| TeamE | TeamF | 3.00 | 5.50 | 1.70 |
| TeamB | TeamA | 1.50 | 4.50 | 8.70 |
| TeamD | TeamC | 1.25 | 4.20 | 8.00 |
| TeamF | TeamE | 1.40 | 5.00 | 7.20 |
+----------+----------+------+------+------+
Here is dput for this dataframe:
structure(list(HomeTeam = c("TeamA", "TeamC", "TeamE", "TeamB",
"TeamD", "TeamF"), AwayTeam = c("TeamB", "TeamD", "TeamF", "TeamA",
"TeamC", "TeamE"), HWO = c(1.3, 1.59, 3, 1.5, 1.25, 1.4), DO = c(5.2,
4.2, 5.5, 4.5, 4.2, 5), AWO = c(9.5, 6.3, 1.7, 8.7, 8, 7.2)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
The additional data frame I need is one that averages each team's odds . It must take into consideration whether that team has played home or away and use the appropriate figure for each match.
Below shows what the final table should look like:
oddsSummary:
Key: AvgWO (Average Win Odds), AvgDO (Average Draw Odds), AvgLO (Average Loss Odds)
+-------+------+------+------+
| Team | AvgWO|AvgDO |AvgLO |
+-------+------+------+------+
| TeamA | 5.00 | 4.85 | 5.50 |
| TeamB | 5.50 | 4.85 | 5.00 |
| TeamC | 4.80 | 4.20 | 3.78 |
| TeamD | 3.78 | 4.20 | 4.80 |
| TeamE | 5.10 | 5.25 | 1.55 |
| TeamF | 1.55 | 5.25 | 5.10 |
+-------+------+------+------+
Here is dput for this dataframe:
structure(list(Team = c("TeamA", "TeamB", "TeamC", "TeamD", "TeamE",
"TeamF"), AvgWO = c(5, 5.5, 4.8, 3.78, 5.1, 1.55), AvgDO = c(4.85,
4.85, 4.2, 4.2, 5.25, 5.25), AvgLO = c(5.5, 5, 7.55, 4.8, 2,
5.1)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
For example, using TeamA as an example…
Add up TeamA's win odds. If they play at home add the HWO figure and if they play away add the AWO figure. Then divide by the total number of matches they have played.
Add up TeamA's draw odds and divide by the total number of matches they have played.
Add up TeamA's loss odds. If they play at home add the AWO figure and if they play away add the HWO figure. Then divide by the total number of matches they have played.
I would appreciate anyone who has a solution for this. Please make sure it is a robust solution that can cope with the different amount of games played etc.
Interesting question.Here is solution, that yields different answer for avgLO for TeamC and TeamE but based on your description I think my solution below is correct. So please double check and let me know.
You can improve naming etc. but I hope this helps.
DF <-
data.frame(
HomeTeam = paste0("Team", c("A", "C", "E", "B", "D", "F")),
AwayTeam = paste0("Team", c("B", "D", "F", "A", "C", "E")),
HWO = c(1.3, 1.59, 3, 1.5, 1.25, 1.4),
DO = c(5.2, 4.2, 5.5, 4.5, 4.2, 5),
AWO = c(9.5, 6.3, 1.7, 8.7, 8, 7.2)
)
library(magrittr)
library(dplyr)
library(reshape2)
DF %>%
melt(c("HWO", "DO", "AWO"), value.name = "Team") %>%
mutate(WO = ifelse(variable == "HomeTeam", HWO, AWO),
LO = ifelse(variable == "HomeTeam", AWO, HWO)) %>%
group_by(Team) %>%
summarise(avgWO = mean(WO),
avgDO = mean(DO),
avgLO = mean(LO))
Result (convert to data.frame if tibble is not OK for you)
# A tibble: 6 x 4
Team avgWO avgDO avgLO
<chr> <dbl> <dbl> <dbl>
1 TeamA 5 4.85 5.5
2 TeamB 5.5 4.85 5
3 TeamC 4.80 4.2 3.78
4 TeamD 3.78 4.2 4.80
5 TeamE 5.1 5.25 1.55
6 TeamF 1.55 5.25 5.1