Search code examples
rkriging

Linear Dependence errors in DiceKriging with linearly independent data


The Problem (with minimal working example)

Dicekriging gives linear dependence errors about half the time when data is close to being linearly dependent. This can be seen by the below example which gave an error about half the time when I ran it on both an Ubuntu and windows computer. This occurs when I run it with either genetic or BFGS optimisation.

install.packages("DiceKriging")
library(DiceKriging)

x = data.frame(Param1 = c(0,1,2,2,2,2), Param2 = c(2,0,1,1e-7,0,2))
y = 1:6
duplicated(cbind(x,y))

Model = km( design = x, response = y , optim.method = "gen", control = list(maxit = 1e4), coef.cov = 1)
Model = km( design = x, response = y , optim.method = "BFGS", control = list(maxit = 1e4), coef.cov = 1)

When the data is a a little more dispersed no such errors occur.

# No problems occur if the data is more dispersed.
x = data.frame(Param1 = c(0,1,2,2,2,2), Param2 = c(2,0,1,1e-2,0,2))
y = 1:6
duplicated(cbind(x,y))

Model = km( design = x, response = y , optim.method = "gen", control = list(maxit = 1e4), coef.cov = 1)

Why this is a problem

Using Kriging for optimization of expensive models means that points near the optima will be densely sampled. It is not possible to do this with this error occuring. In addition the closeness points need to be to get this error can be closer than the 1e-7 above when there are multiple parameters that are all close. I got errors (in my actual problem, not MWE above) when the 4 coordinates of a point were around 1e-3 apart from another point and this problem occurred.

Related Questions

There are not many DiceKriging questions on stack overflow. The closest question is this one (from the Kriging package ) in which the problem is genuine linear dependence. Note the Kriging package is not a substitute for DiceKriging in that it is restricted to 2 dimensions.

Desired Solution

I would like either:

  • A way to change my km call to avoid this problem (preferred)
  • A way to determine when this problem will occur so that I can drop observations that are too close to each other for the kriging call.

Solution

  • Your problem is not a software problem. It's rather a mathematical one.

    Your first data contains the two following points (0 , 2) and (1e-7, 2) that are very very close but correspond to (very) different outputs: 4 and 5. Therefore, you are trying to adjust a Kriging model, which is an interpolation model, to a chaotic response. This cannot work. Kriging/Gaussian process modelling is not the good tool if your response varies a lot between points which are close.

    However, when you are optimizing expensive models, things are not like on your example. There is not such a difference in the response for very close input points. But, there could be indeed numerical problem if your points are very close.

    In order to soften these numerical errors, you can add a nugget effect. The nugget is a small constant variance added to the diagonal of the covariance matrix, which allows the points not to be exactly interpolated. Your kriging approximation curve is not forced to pass exactly through the learning points. Thus, the kriging model becomes a regression model.

    In DiceKriging, the nugget can be added in two ways. First, you can choose a priori a value and add it "manually" by setting km(..., nugget=you_value), or you can ask km to learn it at the same time it learns the parameters of the covariance function, by setting km(..., nugget.estim=TRUE). I advise you to use the second in general.

    Your small example becomes then:

    Model = km( design = x, response = y , optim.method = "BFGS", control = list(maxit = 1e4), 
    coef.cov = 1, nugget.estim=TRUE)
    

    Extract from the DiceKriging documentation:

    nugget: an optional variance value standing for the homogeneous nugget effect.

    nugget.estim: an optional boolean indicating whether the nugget effect should be estimated. Note that this option does not concern the case of heterogeneous noisy observations (see noise.var below). If nugget is given, it is used as an initial value. Default is FALSE.

    PS: The choice of the covariance kernel can be important in some applications. When the function to approximate is quite rough, the exponential kernel (set covtype="exp" in km) can be a good choice. See the book of Rasmussen and Williams for more information (freely and legaly available at http://www.gaussianprocess.org/)