Search code examples
rr-caretnnet

Get caret's model-parameters for nnet


I am having trouble extracting caret's finalModel-parameters for nnet. If I use the - in my mind - exactly same parameters for caret::train and nnet::nnet, I get (sometimes) big differences. Have I forgotten a parameter or is this due to the computation-algorithm of the neural network? I am aware that I can use predict for caret_net (in the example below), but I still would like to reproduce the results only with nnet.

Example:

library(nnet)
library(caret)

len <- 100
set.seed(4321)
X <- data.frame(x1 = rnorm(len, 40, 25), x2 = rnorm(len, 70, 4), x3 = rnorm(len, 1.6, 0.3))
y <- 20000 + X$x1 * 3 - X$x1*X$x2 * 4 - (X$x3**4) * 7 + rnorm(len, 0, 4)
XY <- cbind(X, y)

# pre-processing
preProcPrms <- preProcess(XY, method = c("center", "scale"))
XY_pre <- predict(preProcPrms, XY)

# caret-nnet
controlList <- trainControl(method = "cv", number = 5)
tuneMatrix <- expand.grid(size = c(1, 2), decay = c(0, 0.1))

caret_net <- train(x = XY_pre[ , colnames(XY_pre) != "y"],
                   y = XY_pre[ , colnames(XY_pre) == "y"],
                   method = "nnet",
                   linout = TRUE,
                   TRACE = FALSE,
                   maxit = 100,
                   tuneGrid = tuneMatrix,
                   trControl = controlList)

# nnet-nnet
nnet_net <- nnet(x = XY_pre[ , colnames(XY_pre) != "y"],
                 y = XY_pre[ , colnames(XY_pre) == "y"],
                 linout = caret_net$finalModel$param$linout,
                 TRACE = caret_net$finalModel$param$TRACE,
                 size = caret_net$bestTune$size,
                 decay = caret_net$bestTune$decay,
                 entropy = caret_net$finalModel$entropy,
                 maxit = 100)

# print
print(caret_net$finalModel)
print(nnet_net)

y_caret <- predict(caret_net$finalModel, XY_pre[ , colnames(XY_pre) != "y"])
y_nnet <- predict(nnet_net, XY_pre[ , colnames(XY_pre) != "y"])

plot(y_caret, y_nnet, main = "Hard to spot, but y_caret <> y_nnet - which prm have I forgotten?")
hist(y_caret - y_nnet)

Thx & kind regards


Solution

  • As stated in the comments the discrepancy is caused by different seeds. To quote @Artem Sokolov: Neural net training typically starts from a random state. It's reasonable to expect that caret::train and nnet::nnet start from two different states. Consequently, they likely converge to two different local optima.

    To get a reproducible model start from the same seed:

    controlList <- trainControl(method = "none", seeds = 1)
    tuneMatrix <- expand.grid(size = 2, decay = 0)
    
    set.seed(1)
    caret_net <- train(x = XY_pre[ , colnames(XY_pre) != "y"],
                       y = XY_pre[ , colnames(XY_pre) == "y"],
                       method = "nnet",
                       linout = TRUE,
                       TRACE = FALSE,
                       maxit = 100,
                       tuneGrid = tuneMatrix,
                       trControl = controlList)
    
    set.seed(1)
    nnet_net <- nnet(x = XY_pre[ , colnames(XY_pre) != "y"],
                     y = XY_pre[ , colnames(XY_pre) == "y"],
                     linout = caret_net$finalModel$param$linout,
                     TRACE = caret_net$finalModel$param$TRACE,
                     size = caret_net$bestTune$size,
                     decay = caret_net$bestTune$decay,
                     entropy = caret_net$finalModel$entropy,
                     maxit = 100)
    
    y_caret <- predict(caret_net, XY_pre[ , colnames(XY_pre) != "y"])
    y_nnet <- predict(nnet_net, XY_pre[ , colnames(XY_pre) != "y"])
    
    
    all.equal(as.vector(y_caret[,1]), y_nnet[,1])
    #TRUE
    

    apart from setting the same seeds the key is to avoid re-sampling in caret since it depends on the seed and precedes the model training.