Search code examples
neural-networkr-caretnnet

Nnet in caret, basic structure


I'm very new to caret package and nnet in R. I've done some projects related to ANN with Matlab before, but now I need to work with R and I need some basic help. My input dataset has 1000 observations (in rows) and 23 variables (in columns). My output has 1000 observations and 12 variables.

Here are some sample data that represent my dataset and might help to understand my problem better:

input = as.data.frame(matrix(sample(1 : 20, 100, replace = TRUE), ncol = 10))
colnames(input) = paste ( "X" , 1:10, sep = "") #10 observations and 10 variables

output = as.data.frame(matrix(sample(1 : 20, 70, replace = TRUE), ncol = 7))
colnames(output) = paste ( "Y" , 1:7, sep = "") #10 observations and 7 variables


#nnet with caret:
net1 = train(output ~., data = input, method= "nnet", maxit = 1000) 

When I run the code, I get this error:

error: invalid type (list) for variable 'output'.

I think I have to add all output variables separately (which is very annoying, especially with a lot of variables), like this:

train(output$Y1 + output$Y2 + output$Y3 + output$Y4 + output$Y5 +
 output$Y6 + output$Y7 ~., data = input, method= "nnet", maxit = 1000)

This time it runs but I get this error:

Error in [.data.frame(data, , all.vars(Terms), drop = FALSE) : undefined columns selected

I try to use neuralnet package, with the code below it works perfectly but I still have to add output variables separately :(

net1 = neuralnet(output$Y1 + output$Y2 + output$Y3 + output$Y4 +
 output$Y5 + output$Y6 + output$Y7 ~., data = input, hidden=c(2,10))

p.s. since these sample data are created randomly, the neuralnet cannot converge, but in my real data it works well (in comparison to Matlab ANN)

Now, if you could help me with a way to put output variables automatically (not manually), it solves my problem (although with neuralnet not caret).


Solution

  • After trying different things and searches, I finally found a solution:

    First, we must use as.formula to show the relation between our input and output. With the code below we don't need to add all the variables separately:

    names1 <- colnames(output) #the name of our variables in the output
    names2 = colnames(input) #the name of our variables in the input
    
    a <- as.formula(paste(paste(names1,collapse='+', sep = ""),' ~ ' 
                          ,paste(names2,collapse='+', sep = "")))
    

    then we have to combine our input and output in a single data frame:

    all_data = cbind(output, input)
    

    then, use neuralnet like this:

    net1 = neuralnet(formula = a, data = all_data, hidden=c(2,10))
    
    plot(net1)
    

    plot(net1)

    This is also work with the caret package:

    net1 = train(a, data = all_data, method= "nnet", maxit = 1000)
    
    

    but it seems neuralnet works faster (at least in my case).

    I hope this helps someone else.