So this is my problem.
I am using a script in R to create a Neural Network to generate the missing values of a file, this file is as follows:
Flag | Date | Time | Value
V | 20100901 | 00:00 | 23180
V | 20100901 | 00:15 | 23280
V | 20100901 | 00:30 |
V | 20100901 | 00:45 |
V | 20100901 | 01:00 !
V | 20100901 | 01:15 | 23050
(etc...)
This data is being read and stored by my java program the previous piece of the file was just on an indicator of the values that I am working with.
So I then create the RApp in Java and, after I read a file I take to treating it. My training input is as follows. (Note: to exemplifly I use a small amout of data, namely 9 rows, but my files usualy have arround 35000 rows in them. I also generate some tags with the values read for year, month, day of week, day of month, and so on so that is why you are seeing values no present in the file example above.).
Training Input(Uses 50% of complete data)
[VECTOR ([INT* (2, 2, 2, 2)], [INT* (2010, 2010, 2010, 2010)], [INT* (9, 9, 9, 9)], [INT* (39, 39, 39, 39)], [INT* (3, 3, 3, 3)], [INT* (39, 39, 39, 39)], [INT* (0, 900, 4500, 5400)])]
Created with the code:
re.assign("season_flag", p_file.getSeasonArray(ANNEnum.TRAINING));
re.assign("year_flag", p_file.getYearArray(ANNEnum.TRAINING));
re.assign("month_flag", p_file.getMonthArray(ANNEnum.TRAINING));
re.assign("week_flag", p_file.getWeekArray(ANNEnum.TRAINING));
re.assign("day_of_week_flag", p_file.getDayOfWeekArray(ANNEnum.TRAINING));
re.assign("weekend_flag", p_file.getWeekendArray(ANNEnum.TRAINING));
re.assign("datetime", p_file.getTimeArray(ANNEnum.TRAINING));
re.eval("trainingInput <- data.frame(season_flag,year_flag,month_flag,week_flag,day_of_week_flag,weekend_flag,datetime)");
Training Output
[INT* (23180, 23280, 23050, 23110)]
created with the code
re.assign("trainingOutput", p_file.getValueArray(ANNEnum.TRAINING));
Then my
Test Data
[VECTOR ([INT* (2, 2)], [INT* (2010, 2010)], [INT* (9, 9)], [INT* (39, 39)], [INT* (3, 3)], [INT* (0, 0)], [INT* (10800, 11700)])]
The test data is created the same way ad the Training Input.
Then I call the R script:
re.eval("network <- runANN(trainingInput, inputColNames, trainingOutput, outputColNames, testData, " + layercount + ", " + threshold + ")");
All the values are defined beforehand.
The R script is as follows
runANN <- function(trainingInput, inputColNames, trainingOutput, outputColNames, testData, hiddenLayers, threshold){
library("neuralnet")
#Column bind the data into one variable
trainingdata <- cbind(trainingInput,trainingOutput)
colnames(trainingdata) <- c(outputColNames,inputColNames)
trainingdata <- as.data.frame(trainingdata)
#construct formula
formula <- as.formula(paste(paste(outputColNames, collapse= "+"), paste("~", paste(inputColNames, collapse= "+"))))
#Train the neural network
net.sqrt <- neuralnet(formula,trainingdata, hidden=hiddenLayers, threshold=threshold)
colnames(testData) <- c(inputColNames)
testData <- as.data.frame(testData)
#Test the neural network on some training data
net.results <- compute(net.sqrt, testData) #Run them through the neural network
#Lets see the results
#print(net.results$net.result)
return(print(net.results$net.result))
}
And here comes my problem, the results this will give me are:
[,1]
[1,] 2.00002384
[2,] 2.00002384
[REAL* (2.000023839778315, 2.000023839778315)]
When I was expecting values arround 23000. Obviously I am doing something wrong along the way. However I can't figure it out, I apreciate any help someone might give.
Thank you for your time.
It ended up being an erron in the R script
colnames(trainingdata) <- c(outputColNames,inputColNames)
This line was making it so that the first column was the output column isntead of the last, since the first column has only the value 2 the results are to be expected.
The problem where I got the same value in both options has to do with mistakenly not normalizing the Input and Output data before putting it though the network.
Thank you to everyone that atempted to help me with this issuel