I have implemented a wrapper function for optim
using a non-standard (piecewise linear) cost function and optimizing with regard to this cost function. (Think lm
but with the cost function least squares replaced with a custom function.)
This process works very well for low-dimensional mock data. However, my data (which unfortunately I cannot share) is relatively high-dimensional (~100 columns, no constant columns).
Using this high-dimensional data, the optimization parameters differ from their initial values by ~0.001 (more or less depending on the method used), even using the argument control = list(maxit = 5000)
and trying out all implemented optimization method
s:
optim(par = c(mean(b), rep(0, ncol(A) - 1)), fn = costFn, A = A, b = b, method = "CG", control = list(maxit = 5000))
Here A
is the matrix of predictors, b
is the dependent variable and the costFn
is defined as
costFn <- function(A, par, b) {
sum(
wPlus * pmax(A %*% par - b, 0)
+ wMinus * pmax(b - A %*% par, 0)
)
}
with constants wPlus
, wMinus
> 0.
I feel like this process terminates way too fast for 5000 iterations (< 2 seconds).
The cost function regarded as a function R->R is convex and the input is a linear combination of the predictors and the dependent variable so it should also be convex regarded as a function R^n->R. Therefore, the methods should converge to the global minimum (and not get stuck in local minima as there are no others). Correct me if I am wrong here.
My question is if I seem to have made an obvious mistake (which, I know, is hard to tell as I cannot provide a reprex) or if there are better alternatives to optim
.
Possible mistakes include:
maxit
argument correctly.This is non-differentiable, so you can expect local, gradient-based NLP solvers to get into trouble. However, what you have is a slight generalization of the L1-norm or LAD regression:
min sum(k, |r[k]|)
r = Ax - b
r, x free variables
This is basically your problem with wPlus=wMinus = 1
. This L1 problem can be stated as a Linear Programming (LP) model as follows:
min sum(k, rPlus[k] + rMin[k])
rPlus - rMin = Ax - b
rPlus[k],rMin[k]>=0
x free variable
LP solvers for use under R are readily available.
The modeling technique we used here is called variable splitting [link]. In the optimal solution only one of each pair (rPlus[k],rMin[k])
will be non-zero.
We can use this formulation for your problem by just adjusting the objective a little bit:
min sum(k, wPlus*rPlus[k] + wMin*rMin[k])
rPlus - rMin = Ax - b
rPlus[k],rMin[k]>=0
x free variable
LP problems with many variables and many constraints can be solved very efficiently using a good LP solver.