Search code examples
roptimizationnanarbitrary-precision

Optimization for functions that produce NaN for some initial values


I would like to find all local minimums of the following objective function

func <- function(b){Mat=matrix(c(+0.5*1/((1/(exp(-b[1]-b[2]*-5)+1))*(1-(1/(exp(-b[1]-b[2]*-5)+1))))*exp(-b[1] - b[2] * -5)/(exp(-b[1] - b[2] * -5) + 1)^2*exp(-b[1] - b[2] * -5)/(exp(-b[1] - b[2] * -5) + 1)^2+0.5*1/((1/(exp(-b[1]-b[2]*5)+1))*(1-(1/(exp(-b[1]-b[2]*5)+1))))*exp(-b[1] - b[2] * 5)/(exp(-b[1] - b[2] * 5) + 1)^2*exp(-b[1] - b[2] * 5)/(exp(-b[1] - b[2] * 5) + 1)^2,+0.5*1/((1/(exp(-b[1]-b[2]*-5)+1))*(1-(1/(exp(-b[1]-b[2]*-5)+1))))*exp(-b[1] - b[2] * -5) * -5/(exp(-b[1] - b[2] * -5) + 1)^2*exp(-b[1] - b[2] * -5)/(exp(-b[1] - b[2] * -5) + 1)^2+0.5*1/((1/(exp(-b[1]-b[2]*5)+1))*(1-(1/(exp(-b[1]-b[2]*5)+1))))*exp(-b[1] - b[2] * 5) * 5/(exp(-b[1] - b[2] * 5) + 1)^2*exp(-b[1] - b[2] * 5)/(exp(-b[1] - b[2] * 5) + 1)^2,+0.5*1/((1/(exp(-b[1]-b[2]*-5)+1))*(1-(1/(exp(-b[1]-b[2]*-5)+1))))*exp(-b[1] - b[2] * -5)/(exp(-b[1] - b[2] * -5) + 1)^2*exp(-b[1] - b[2] * -5) * -5/(exp(-b[1] - b[2] * -5) + 1)^2+0.5*1/((1/(exp(-b[1]-b[2]*5)+1))*(1-(1/(exp(-b[1]-b[2]*5)+1))))*exp(-b[1] - b[2] * 5)/(exp(-b[1] - b[2] * 5) + 1)^2*exp(-b[1] - b[2] * 5) * 5/(exp(-b[1] - b[2] * 5) + 1)^2,+0.5*1/((1/(exp(-b[1]-b[2]*-5)+1))*(1-(1/(exp(-b[1]-b[2]*-5)+1))))*exp(-b[1] - b[2] * -5) * -5/(exp(-b[1] - b[2] * -5) + 1)^2*exp(-b[1] - b[2] * -5) * -5/(exp(-b[1] - b[2] * -5) + 1)^2+0.5*1/((1/(exp(-b[1]-b[2]*5)+1))*(1-(1/(exp(-b[1]-b[2]*5)+1))))*exp(-b[1] - b[2] * 5) * 5/(exp(-b[1] - b[2] * 5) + 1)^2*exp(-b[1] - b[2] * 5) * 5/(exp(-b[1] - b[2] * 5) + 1)^2),2,2);d=(det(Mat));return(d)}

'func' is determinant of Fisher information matrix of Logistic regression model and is a function of parameters b1 and b2 where b1 belongs to [-.3, .3] and b2 to [6, 8]

Suppose these two initial values for b = c(b1, b2)

> in1 <- c(-0.04785405, 6.42711047)
> in2 <- c(0.2246729, 7.5211575)

The local minimum with initial value in1 is:

> optim(in1, fn = func, lower = c(-.3, 6), upper = c(.3, 8), method = "L-BFGS-B")

$par
[1] -0.04785405  6.42711047

$value
[1] 3.07185e-27

$counts
function gradient 
   1        1 

$convergence
[1] 52

$message
[1] "ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"

As can be seen in the $massage a termination happened in optimization process and minimum could not be computed and optim returned in1 as local optima.

For 'in2' also an error is appeared:

> optim(in2, fn = func, lower = c(-.3, 6), upper = c(.3, 8), method = "L-BFGS-B")

Error in optim(in2, fn = func, lower = c(-0.3, 6), upper = c(0.3, 8),  : 
L-BFGS-B needs finite values of 'fn'

This error happened because the value of func for in2' isNaN`:

> func(in2)
[1] NaN

However for in1 the value of objective function at in1 is calculated but the optimization is terminated because optim could not continue the calculation for another intial values:

> func(in1)
[1] 3.07185e-27

Let me define func without det and just as matrix to see what happened:

Mat.func <- function(b){Mat=matrix(c(+0.5*1/((1/(exp(-b[1]-b[2]*-5)+1))*(1-(1/(exp(-b[1]-b[2]*-5)+1))))*exp(-b[1] - b[2] * -5)/(exp(-b[1] - b[2] * -5) + 1)^2*exp(-b[1] - b[2] * -5)/(exp(-b[1] - b[2] * -5) + 1)^2+0.5*1/((1/(exp(-b[1]-b[2]*5)+1))*(1-(1/(exp(-b[1]-b[2]*5)+1))))*exp(-b[1] - b[2] * 5)/(exp(-b[1] - b[2] * 5) + 1)^2*exp(-b[1] - b[2] * 5)/(exp(-b[1] - b[2] * 5) + 1)^2,+0.5*1/((1/(exp(-b[1]-b[2]*-5)+1))*(1-(1/(exp(-b[1]-b[2]*-5)+1))))*exp(-b[1] - b[2] * -5) * -5/(exp(-b[1] - b[2] * -5) + 1)^2*exp(-b[1] - b[2] * -5)/(exp(-b[1] - b[2] * -5) + 1)^2+0.5*1/((1/(exp(-b[1]-b[2]*5)+1))*(1-(1/(exp(-b[1]-b[2]*5)+1))))*exp(-b[1] - b[2] * 5) * 5/(exp(-b[1] - b[2] * 5) + 1)^2*exp(-b[1] - b[2] * 5)/(exp(-b[1] - b[2] * 5) + 1)^2,+0.5*1/((1/(exp(-b[1]-b[2]*-5)+1))*(1-(1/(exp(-b[1]-b[2]*-5)+1))))*exp(-b[1] - b[2] * -5)/(exp(-b[1] - b[2] * -5) + 1)^2*exp(-b[1] - b[2] * -5) * -5/(exp(-b[1] - b[2] * -5) + 1)^2+0.5*1/((1/(exp(-b[1]-b[2]*5)+1))*(1-(1/(exp(-b[1]-b[2]*5)+1))))*exp(-b[1] - b[2] * 5)/(exp(-b[1] - b[2] * 5) + 1)^2*exp(-b[1] - b[2] * 5) * 5/(exp(-b[1] - b[2] * 5) + 1)^2,+0.5*1/((1/(exp(-b[1]-b[2]*-5)+1))*(1-(1/(exp(-b[1]-b[2]*-5)+1))))*exp(-b[1] - b[2] * -5) * -5/(exp(-b[1] - b[2] * -5) + 1)^2*exp(-b[1] - b[2] * -5) * -5/(exp(-b[1] - b[2] * -5) + 1)^2+0.5*1/((1/(exp(-b[1]-b[2]*5)+1))*(1-(1/(exp(-b[1]-b[2]*5)+1))))*exp(-b[1] - b[2] * 5) * 5/(exp(-b[1] - b[2] * 5) + 1)^2*exp(-b[1] - b[2] * 5) * 5/(exp(-b[1] - b[2] * 5) + 1)^2),2,2);d=Mat;return(d)}

We get

         > Mat.func(in1)
              [,1]         [,2]
         [1,] 1.109883e-14 2.784007e-15
         [2,] 2.784007e-15 2.774708e-13

        > Mat.func(in2)
              [,1] [,2]
          [1,]  Inf  Inf
          [2,]  Inf  Inf

Hence, by double precision, values of Mat.func(in2) elements are Inf. I also rewrite Mat.func with mpfr function:

Mat.func.mpfr <-function(b, prec){ d=c(+0.5*1/((1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*-5)+1))*(1-(1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*-5)+1))))*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5)/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) + 1)^2*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5)/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) + 1)^2+0.5*1/((1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*5)+1))*(1-(1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*5)+1))))*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5)/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) + 1)^2*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5)/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) + 1)^2,
                                   +0.5*1/((1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*-5)+1))*(1-(1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*-5)+1))))*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) * -5/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) + 1)^2*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5)/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) + 1)^2+0.5*1/((1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*5)+1))*(1-(1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*5)+1))))*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) * 5/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) + 1)^2*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5)/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) + 1)^2,
                                   +0.5*1/((1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*-5)+1))*(1-(1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*-5)+1))))*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5)/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) + 1)^2*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) * -5/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) + 1)^2+0.5*1/((1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*5)+1))*(1-(1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*5)+1))))*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5)/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) + 1)^2*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) * 5/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) + 1)^2,
                                   +0.5*1/((1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*-5)+1))*(1-(1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*-5)+1))))*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) * -5/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) + 1)^2*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) * -5/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * -5) + 1)^2+0.5*1/((1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*5)+1))*(1-(1/(exp(-mpfr(b[1], precBits = prec)-mpfr(b[2], precBits = prec)*5)+1))))*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) * 5/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) + 1)^2*exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) * 5/(exp(-mpfr(b[1], precBits = prec) - mpfr(b[2], precBits = prec) * 5) + 1)^2)
                               Mat = new("mpfrMatrix", d, Dim = c(2L, 2L))
                               return(Mat)}

Thus:

require(Rmpfr)
> Mat.func.mpfr(c(in1), prec = 54)
'mpfrMatrix' of dim(.) =  (2, 2) of precision  54   bits 
     [,1]                   
 [1,] 1.10988301365972506e-14
 [2,] 2.78400749725484580e-15
      [,2]                   
 [1,] 2.78400749725484580e-15
 [2,] 2.77470753414931256e-13

 > Mat.func.mpfr(c(in2), prec = 54)
 'mpfrMatrix' of dim(.) =  (2, 2) of precision  54   bits 
      [,1] [,2]
 [1,]  Inf  Inf
 [2,]  Inf  Inf

 > Mat.func.mpfr(c(in2), prec = 55)
 'mpfrMatrix' of dim(.) =  (2, 2) of precision  55   bits 
      [,1]                    
 [1,]  4.16032108702067276e-17
 [2,] -8.34300174643550123e-17
      [,2]                    
 [1,] -8.34300174643550154e-17
 [2,]  1.04008027175516816e-15

So by precision 55 the values of matrix elements are not Inf anymore. unfortunately, mpfr function changes the class of an objective and nor det neither r optimization functions can not be applied, to clarify I provide two examples:

> class(mpfr (1/3, 54))
[1] "mpfr"
attr(,"package")
[1] "Rmpfr"

## determinant
example1 <- function(x){
  d <- c(mpfr(x, prec = 54), 3 * mpfr(x, prec = 54), 5 * mpfr(x, prec = 54), 7 * mpfr(x, prec = 54))
  Mat = new("mpfrMatrix", d, Dim = c(2L, 2L))
  return(det(Mat))
}

> example1(2)
Error in UseMethod("determinant") : 
no applicable method for 'determinant' applied to an object of class "c('mpfrMatrix',    'mpfrArray', 'Mnumber', 'mNumber', 'mpfr', 'list', 'vector')"

##optimization 
example2 <- function(x)  ## Rosenbrock Banana function
   100 * (mpfr(x[2], prec = 54) - mpfr(x[1], prec = 54) * mpfr(x[1], prec = 54 ))^2 + (1 - mpfr(x[1], prec = 54))^2

> example2(c(-1.2, 1))
1 'mpfr' number of precision  54   bits 
[1] 24.1999999999999957
> optim(c(-1.2,1), example2)
Error in optim(c(-1.2, 1), example2) : 
(list) object cannot be coerced to type 'double'

Hence, using mpfr could not solve the problem.

To find All the local minimums, an algorithm which applies different random initial values should be written. But as can be seen, for some initial values the function produces NaN (ignorance of these values would not be a good idea because it may generally results in missing some local minimums ,specially for functions that have lots of local optima).

I was wondering if there is any R package that can carry on optimization process with arbitrary precision to avoid NaN for objective functions?

Thank you


Solution

  • I think the answer (I think 'agstudy' gave, too) is: Make sure that the function you minimize does NOT return NaN (or NA) but rather +Inf (if you minimize, or -Inf if you maximize).

    2nd: Instead of log(det(.)) you REALLY should use
    { r <- determinant(., log=TRUE) ; if(r$sign <= 0) -Inf else r$modulus }

    which is also more accurate. {Hint: do look how det is defined in R !}

    Now to Rmpfr, I will reply separately. It should work like standard R to use "mpfr"-numbers, .... says the author of Rmpfr .... but you may need a bit of care. tryCatch() should not be needed, however.