Optimization problem - minimizing a function in R

I want to minimize a function but can not get ahead.

Problem setting:

mtcars$gender <- c(rep(1, 10), rep(0, 4), rep(1, 6), rep(0 , 12))

predictions <- data.frame(
  c(0.05,   0.03,   0.99,   0.07,   0.00,   0.10,   0.00,   0.84,   0.92,   0.01,   0.03,   0.00,   0.00,   0.00,   0.00,   0.00,   0.00,   1.00,   1.00,   1.00,   0.97,   0.00,   0.00,   0.00,   0.00,   1.00,   0.86,   0.84,   0.01,   0.08,   0.00,   0.86),
  c(0.95,   0.97,   0.01,   0.80,   0.07,   0.82,   0.00,   0.14,   0.08,   0.95,   0.94,   0.03,   0.03,   0.03,   0.00,   0.00,   0.00,   0.00,   0.00,   0.00,   0.03,   0.02,   0.07,   0.02,   0.01,   0.00,   0.12,   0.16,   0.10,   0.79,   0.05,   0.13),
  c(0.00,   0.00,   0.00,   0.13,   0.93,   0.08,   1.00,   0.02,   0.00,   0.04,   0.03,   0.97,   0.97,   0.97,   1.00,   1.00,   1.00,   0.00,   0.00,   0.00,   0.00,   0.98,   0.93,   0.98,   0.99,   0.00,   0.02,   0.00,   0.89,   0.13,   0.95,   0.01))
colnames(predictions) <- c(4, 6, 8)


actual.probs <- apply(predictions, 1, which.max) 
actual.probs <- as.data.frame.matrix(prop.table(table(mtcars$gender, actual.probs)))
real.probs <- data.frame(matrix(c(0.1, 0.1, 0.2, 0.2, 0.2, 0.2), nrow = 2, ncol = 3))

I used a prediction algorithm which gives me probabilites to a car to have 4,6 or 8 cyl. The result is stored in "predictions". However the distribution (actual.probs) differs from the distribution seen in reality (real.probs). To adjust that, i want to multiply the probalities by a weight, get the one with the highest probability and recalculate the table. The result I want is the weights i need to get the smallest deviation from the real distribution.

optimresult <- predictions 

fn <- function(v) {
  weight1 <- v[1]
  weight2 <- v[2]
  weight3 <- v[3]

  optimresult[,1] <- optimresult[,1] * weight1
  optimresult[,2] <- optimresult[,2] * weight2
  optimresult[,3] <- optimresult[,3] * weight3

  result <- apply(optimresult, 1, which.max) # get highest probablity

  actualprobs <- prop.table(table(mtcars[["gender"]], result))
  return <- sum(abs(real.probs - actualprobs))
}

optim(c(1, 1, 1), fn)

Startvalues are all one, however the function seems not to work. What am I doing wrong?

Solution

The problem is that small changes to the parameter values in optim() do not change the result meaning that the algorithm thinks it has converged before it actually has.

Using method SANN gives a better result. I am not sure whether it is the best result you can get with that sample dataset.

I also made some simplifications to your function.

fn <- function(v) {

  weighted_preds = predictions * v

  result = apply(weighted_preds, 1, which.max) # get highest probablity

  actualprobs = prop.table(table(mtcars[["gender"]], result))

  sum(abs(real.probs - actualprobs))
}

optim(c(100, 1, 1), fn, method="SANN")

Try different starting values to see if you can get an improvement. Increasing the number of predictions will also help.