Search code examples
rneural-networknormalization

Neuralnet package in R big error


I am trying to figure out how to make the neuralnet package to work. I did some tests with data I created and with their outcomes (about 50 rows of data and three columns with the fourth being the outcome I wanted and it was made from simple mathematical executions like summing the other three columns) and so far so good. Then I decided to apply the package on real data. I downloaded the mpg dataset from here http://vincentarelbundock.github.io/Rdatasets/datasets.html

I was running the code bellow:

net<- neuralnet(cty~displ+year+cyl+hwy,
                datain, hidden=3)

Even if I have 3 hidden layers, or 8 or 18 the error is the same and the time that the package processes the data is relatively small from this amount of data (234 lines):

        Error Reached Threshold Steps
1 2110.173077    0.006277805853    54

Any good advice for this?


Solution

  • It's a scale problem i guess, you can normalize or scale it. There are differences between scaling and normalizing, it will affect your results and worths a separate question on SO:

    normalize inputs

    norm.fun = function(x){ 
      (x - min(x))/(max(x) - min(x)) 
    }
    
    require(ggplot2) # load mpg dataset
    require(neuralnet)
    
    data = mpg[, c('cty', 'displ', 'year', 'cyl', 'hwy')]
    data.norm = apply(data, 2, norm.fun)
    
    net = neuralnet(cty ~ displ + year + cyl + hwy, data.norm, hidden = 2)
    

    Then you can denormalize the data

    # restore data 
    y.net = min(data[, 'cty']) + net$net.result[[1]] * range(data[, 'cty'])
    plot(data[, 'cty'], col = 'red')
    points(y.net)
    

    enter image description here

    scale inputs

    data.scaled = scale(data)
    net = neuralnet(cty ~ displ + year + cyl + hwy, data.scaled, hidden = 2)
    
    # restore data 
    y.sd = sd(data[, 'cty'])
    y.mean = mean(data[, 'cty'])
    
    y.net = net$net.result[[1]] * y.sd + y.mean
    plot(data[, 'cty'], col = 'red')
    points(y.net)
    

    enter image description here

    You can also try the nnet package, it's very fast:

    require(nnet)
    
    data2 = mpg
    data2$year = scale(data2$year)
    fit = nnet(cty ~ displ + year + cyl + hwy, size = 10, data = data2, linout = TRUE)
    plot(mpg$cty)
    points(fit$fitted.values, col = 'red')