I have a dataframe df
of football matches and corresponding mean scoring rates for each team. The first 2 columns are Home and Away teams, the 3rd and 4th columns are mean scoring rates for Home and Away teams respectively. For example:
Home Away HomeRate AwayRate
1 Arsenal Bristol 1.3 1.3
2 Chelsea Newcastle 2.4 2.3
3 Liverpool Leeds 3.3 1.7
4 Bolton Fulham 2.1 2.5
I want to generate 3 new columns, "HomeWin", "Draw", "AwayWin" giving Poisson probabilities of home win, draw, away win respectively for each match and add these columns to my dataframe. I assume events are independent.
I can do this manually for each individual match separately. But I don't know how to do this all at once for the whole table. Take for instance 2nd row (Chelsea vs Newcastle). I can create a probability matrix consisting of probabilities for each respective scoreline between Chelsea and Newcastle (0-0, 0-1 0-2, etc):
H <- c(dpois(0:20, df[2,3]), ppois(20, df[2,3], lower.tail=FALSE))
A <- c(dpois(0:20, df[2,4]), ppois(20, df[2,3], lower.tail=FALSE))
ProbMatrix = outer(H, A)
In the above block, df[2,3]
is Chelsea mean score rate, df[2,4]
is Newcastle mean score rate and I am assuming max number of possible goals scored for each side is 20 (hence I am using dpois(0:20)
and ppois(20)
in the above, to get the total sum of probabilities as close to 1 as possible.
Then I can simply sum over various elements of the matrix above to generate Home, Away and Draw win probabilities:
Prob_Home_Win = sum(ProbMatrix[lower.tri(ProbMatrix)])
Prob_Away_Win = sum(ProbMatrix[upper.tri(ProbMatrix)])
Prob_Draw = sum(diag(ProbMatrix))
But how can I generate efficiently the whole HomeWin, AwayWin and Draw columns yielding the probabilities for each match in df
?
apply
a function to your data.frame, e.g. like so:
win_rate <- function(goal_rates) {
H <- c(dpois(0:20, goal_rates[1]), ppois(20, goal_rates[1], lower.tail=FALSE)
# !! note goal_rates[1] for `ppois`, not sure if you had a typo...
A <- c(dpois(0:20, goal_rates[2]), ppois(20, goal_rates[2], lower.tail=FALSE))
ProbMatrix = outer(H, A)
Prob_Home_Win = sum(ProbMatrix[lower.tri(ProbMatrix)])
Prob_Away_Win = sum(ProbMatrix[upper.tri(ProbMatrix)])
Prob_Draw = sum(diag(ProbMatrix))
return(c(Prob_Home_Win, Prob_Away_Win, Prob_Draw))
}
results <- apply(df[3:4], 1, win_rate) |>
t() |> as.data.frame() |> setNames(c('HomeWin', 'HomeLoss', 'Draw'))
cbind(df, results)
Home Away HomeRate AwayRate HomeWin HomeLoss Draw 1 Arsenal Bristol 1.3 1.3 0.3680430 0.3680430 0.2639140 2 Chelsea Newcastle 2.4 2.3 0.4231169 0.3873484 0.1895347 3 Liverpool Leeds 3.3 1.7 0.6870952 0.1676645 0.1452403 4 Bolton Fulham 2.1 2.5 0.3335564 0.4774577 0.1889859