I'm outside my domain expertise and even though I've created an 'answer' (with lots of googling and ChatGPT) I need someone to let me know if I'm doing what I intended to do.
I'm following a tutorial on Creating an Expected Points Model Inspired by Pythagorean Expectation. He uses a solver from Excel to find the optimal exponent coefficient c
by looking for the lowest mse. It is my understanding that optimize()
does a similar job.
I have come up with the following function but don't understand how the values are getting passed to optimize()
and why it looks like the function produces three values predicted_xPts_perc
, se
, and mean(se)
. Shouldn't it provide two?
# Objective function
objective_function <- function(c) {
predicted_xPts_perc <- data$xg_for_per_match^c / (data$xg_for_per_match^c + data$xg_against_per_match^c)
se <- (predicted_xPts_perc - data$actual_points_perc)^2
mean(se)
}
# Optimize the objective
result <- optimize(f = objective_function, interval = c(0, 5), lower = 1, upper = 3)
optimal_c <- result$minimum
print(optimal_c)
Here is the data (sample) and full code block:
data <- structure(list(xg_for_per_match = c(2.431, 2.256, 2.399, 1.86777777777778,
2.56, 1.776, 1.59222222222222, 1.665, 0.907, 1.691, 1.52555555555556,
0.56, 1.46555555555556, 0.754, 1.955, 1.285, 1.38428571428571,
1.282, 1.86625, 1.5075), xg_against_per_match = c(0.757, 0.682,
0.739, 1.40555555555556, 0.849, 1.409, 1.17777777777778, 1.822,
1.938, 2.489, 1.70666666666667, 3.136, 2.37555555555556, 2.985,
0.78375, 1.56, 1.43, 0.812, 0.945, 0.94875), actual_points_perc = c(0.866666666666667,
0.766666666666667, 0.733333333333333, 0.666666666666667, 0.6,
0.6, 0.592592592592593, 0.566666666666667, 0.4, 0.4, 0.37037037037037,
0.2, 0.111111111111111, 0, 0.791666666666667, 0.625, 0.619047619047619,
0.6, 0.541666666666667, 0.416666666666667), se = c(0.0020194397996264,
0.0223794165299103, 0.0323996328399904, 0.000796304748754376,
0.0905484326863811, 0.00018817673840017, 0.00288909832711458,
0.0124545498241967, 0.0485423096732081, 0.0070889390780061, 0.0054423086667482,
0.0285940157162698, 0.027082829359042, 0.00359736139900119, 0.00488179512531288,
0.048737642866935, 0.0183025693952632, 0.0129244422758347, 0.0646460987510766,
0.0897732500472823)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
# Objective function
objective_function <- function(c) {
predicted_xPts_perc <- data$xg_for_per_match^c / (data$xg_for_per_match^c + data$xg_against_per_match^c)
se <- (predicted_xPts_perc - data$actual_points_perc)^2
mean(se)
}
# Optimize the objective
result <- optimize(f = objective_function, interval = c(0, 5), lower = 1, upper = 3)
optimal_c <- result$minimum
print(optimal_c)
That does work, but R has the nls()
function for performing non-linear least square fits.
model <- nls(actual_points_perc ~ xg_for_per_match^c / (xg_for_per_match^c + xg_against_per_match^c), data=data, start=list(c=1))
summary(model)
# Formula: actual_points_perc ~ xg_for_per_match^c/(xg_for_per_match^c +
# xg_against_per_match^c)
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# c 1.0249 0.1935 5.297 4.11e-05 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1242 on 19 degrees of freedom
#
# Number of iterations to convergence: 2
# Achieved convergence tolerance: 9.647e-06