I want to minimize a function but can not get ahead.
Problem setting:
mtcars$gender <- c(rep(1, 10), rep(0, 4), rep(1, 6), rep(0 , 12))
predictions <- data.frame(
c(0.05, 0.03, 0.99, 0.07, 0.00, 0.10, 0.00, 0.84, 0.92, 0.01, 0.03, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 1.00, 1.00, 0.97, 0.00, 0.00, 0.00, 0.00, 1.00, 0.86, 0.84, 0.01, 0.08, 0.00, 0.86),
c(0.95, 0.97, 0.01, 0.80, 0.07, 0.82, 0.00, 0.14, 0.08, 0.95, 0.94, 0.03, 0.03, 0.03, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.03, 0.02, 0.07, 0.02, 0.01, 0.00, 0.12, 0.16, 0.10, 0.79, 0.05, 0.13),
c(0.00, 0.00, 0.00, 0.13, 0.93, 0.08, 1.00, 0.02, 0.00, 0.04, 0.03, 0.97, 0.97, 0.97, 1.00, 1.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.98, 0.93, 0.98, 0.99, 0.00, 0.02, 0.00, 0.89, 0.13, 0.95, 0.01))
colnames(predictions) <- c(4, 6, 8)
actual.probs <- apply(predictions, 1, which.max)
actual.probs <- as.data.frame.matrix(prop.table(table(mtcars$gender, actual.probs)))
real.probs <- data.frame(matrix(c(0.1, 0.1, 0.2, 0.2, 0.2, 0.2), nrow = 2, ncol = 3))
I used a prediction algorithm which gives me probabilites to a car to have 4,6 or 8 cyl. The result is stored in "predictions". However the distribution (actual.probs
) differs from the distribution seen in reality (real.probs
). To adjust that, i want to multiply the probalities by a weight, get the one with the highest probability and recalculate the table. The result I want is the weights i need to get the smallest deviation from the real distribution.
optimresult <- predictions
fn <- function(v) {
weight1 <- v[1]
weight2 <- v[2]
weight3 <- v[3]
optimresult[,1] <- optimresult[,1] * weight1
optimresult[,2] <- optimresult[,2] * weight2
optimresult[,3] <- optimresult[,3] * weight3
result <- apply(optimresult, 1, which.max) # get highest probablity
actualprobs <- prop.table(table(mtcars[["gender"]], result))
return <- sum(abs(real.probs - actualprobs))
}
optim(c(1, 1, 1), fn)
Startvalues are all one, however the function seems not to work. What am I doing wrong?
The problem is that small changes to the parameter values in optim() do not change the result meaning that the algorithm thinks it has converged before it actually has.
Using method SANN gives a better result. I am not sure whether it is the best result you can get with that sample dataset.
I also made some simplifications to your function.
fn <- function(v) {
weighted_preds = predictions * v
result = apply(weighted_preds, 1, which.max) # get highest probablity
actualprobs = prop.table(table(mtcars[["gender"]], result))
sum(abs(real.probs - actualprobs))
}
optim(c(100, 1, 1), fn, method="SANN")
Try different starting values to see if you can get an improvement. Increasing the number of predictions will also help.