Search code examples
javarjri

Problems with R neural network results using JRI


So this is my problem.

I am using a script in R to create a Neural Network to generate the missing values of a file, this file is as follows:

Flag |     Date  | Time  | Value
V    |  20100901 | 00:00 | 23180
V    |  20100901 | 00:15 | 23280
V    |  20100901 | 00:30 |
V    |  20100901 | 00:45 | 
V    |  20100901 | 01:00 !
V    |  20100901 | 01:15 | 23050
(etc...)

This data is being read and stored by my java program the previous piece of the file was just on an indicator of the values that I am working with.

So I then create the RApp in Java and, after I read a file I take to treating it. My training input is as follows. (Note: to exemplifly I use a small amout of data, namely 9 rows, but my files usualy have arround 35000 rows in them. I also generate some tags with the values read for year, month, day of week, day of month, and so on so that is why you are seeing values no present in the file example above.).

Training Input(Uses 50% of complete data)

[VECTOR ([INT* (2, 2, 2, 2)], [INT* (2010, 2010, 2010, 2010)], [INT* (9, 9, 9, 9)], [INT* (39, 39, 39, 39)], [INT* (3, 3, 3, 3)], [INT* (39, 39, 39, 39)], [INT* (0, 900, 4500, 5400)])]

Created with the code:

re.assign("season_flag", p_file.getSeasonArray(ANNEnum.TRAINING));
re.assign("year_flag", p_file.getYearArray(ANNEnum.TRAINING));
re.assign("month_flag", p_file.getMonthArray(ANNEnum.TRAINING));
re.assign("week_flag", p_file.getWeekArray(ANNEnum.TRAINING));
re.assign("day_of_week_flag", p_file.getDayOfWeekArray(ANNEnum.TRAINING));
re.assign("weekend_flag", p_file.getWeekendArray(ANNEnum.TRAINING));
re.assign("datetime", p_file.getTimeArray(ANNEnum.TRAINING));
re.eval("trainingInput <- data.frame(season_flag,year_flag,month_flag,week_flag,day_of_week_flag,weekend_flag,datetime)");

Training Output

[INT* (23180, 23280, 23050, 23110)]

created with the code

re.assign("trainingOutput", p_file.getValueArray(ANNEnum.TRAINING));

Then my

Test Data

[VECTOR ([INT* (2, 2)], [INT* (2010, 2010)], [INT* (9, 9)], [INT* (39, 39)], [INT* (3, 3)], [INT* (0, 0)], [INT* (10800, 11700)])]

The test data is created the same way ad the Training Input.

Then I call the R script:

re.eval("network <- runANN(trainingInput, inputColNames, trainingOutput, outputColNames, testData, " + layercount + ", " + threshold + ")");

All the values are defined beforehand.

The R script is as follows

runANN <- function(trainingInput, inputColNames, trainingOutput, outputColNames, testData, hiddenLayers, threshold){
  library("neuralnet")

  #Column bind the data into one variable
  trainingdata <- cbind(trainingInput,trainingOutput)

  colnames(trainingdata) <- c(outputColNames,inputColNames)

  trainingdata <- as.data.frame(trainingdata)
  #construct formula
  formula <- as.formula(paste(paste(outputColNames, collapse= "+"), paste("~", paste(inputColNames, collapse= "+"))))

  #Train the neural network
  net.sqrt <- neuralnet(formula,trainingdata, hidden=hiddenLayers, threshold=threshold)

  colnames(testData) <- c(inputColNames)

  testData <- as.data.frame(testData)

  #Test the neural network on some training data
  net.results <- compute(net.sqrt, testData) #Run them through the neural network

  #Lets see the results
  #print(net.results$net.result)

  return(print(net.results$net.result))
}

And here comes my problem, the results this will give me are:

          [,1]
[1,] 2.00002384
[2,] 2.00002384
[REAL* (2.000023839778315, 2.000023839778315)]

When I was expecting values arround 23000. Obviously I am doing something wrong along the way. However I can't figure it out, I apreciate any help someone might give.

Thank you for your time.


Solution

  • It ended up being an erron in the R script

    colnames(trainingdata) <- c(outputColNames,inputColNames)
    

    This line was making it so that the first column was the output column isntead of the last, since the first column has only the value 2 the results are to be expected.

    The problem where I got the same value in both options has to do with mistakenly not normalizing the Input and Output data before putting it though the network.

    Thank you to everyone that atempted to help me with this issuel