Search code examples
.netencog

Encog CSV Loading Exception: "Can't access column 15 in a file that has only 15 columns."


Using encog-core-cs, whose assembly info reports is 3.3.0.0.

When I EncogUtility.LoadCSV2Memory() the call fails with the message described in the Title here.

I am providing LoadCSV2Memory() with what appears to be a properly normalized csv file, with all values below the header row containing floats between 0 and 0.9999...

The normalized csv file which I am using contains 15 columns (through column "O" when viewed in Excel), and I provide the number 15 as the "input" argument to LoadCSV2Memory(). Here is the line of code. "normalizedTrainingFile" is System.IO.FileSystemInfo:

let prune() =
    let trainingSet = EncogUtility.LoadCSV2Memory(normalizedTrainingFile.FullName, 15, 1, true, CSVFormat.English, false)
    let pattern = new FeedForwardPattern(InputNeurons = 25, OutputNeurons = 1, ActivationFunction = ActivationTANH())   
    let prune = new PruneIncremental(trainingSet, pattern, 100, 1, 10, StatusReporter())
    prune.AddHiddenLayer(1, 10)
    prune.AddHiddenLayer(0, 10)
    prune.Process()
    EncogDirectoryPersistence.SaveObject(trainedNetworkFile, prune.BestNetwork)

I am able to overcome this problem by subtracting 1 from the number of columns (making the value 14) that I was providing as the input("count") argument to LoadCSV2Memory(), assuming the problem might be a 0-based index vs. 1-based count problem, but now when my code execution reaches ...

prune.Process()

...Encog throws an exception "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection." in a call to System.Buffer.BlockCopy(...) which is in Encog's EngineArray.ArrayCopy(...) method.

After several hours of trying to step through the code, I feel like it sure would be nice if the internet were to contain a solution to what is likely my misuse of the Encog framework. Thank you.

Update: Here's a snippet from the CSV including the headers and first 3 lines of data:

"DayOfMonth(p0)","DayOfMonth(p1)","DayOfMonth(p2)","DayOfMonth(p3)","DayOfMonth(p4)","DayOfMonth(p5)","DayOfMonth(p6)","DayOfWeek(p0)","DayOfWeek(p1)","DayOfWeek(p2)","DayOfWeek(p3)","DayOfWeek(p4)","DayOfWeek(p5)","MinuteOfDay","Value"
0.755928946018455,-0.436435780471985,-0.308606699924184,-0.239045721866879,-0.195180014589707,-0.164957219768465,-0.142857142857143,-0.763762615825973,-0.440958551844098,-0.311804782231162,-0.241522945769824,-0.197202659436654,-0.166666666666667,-0.853658536585366,-0.964430519719867
0,0.87287156094397,-0.308606699924184,-0.239045721866879,-0.195180014589707,-0.164957219768465,-0.142857142857143,0,0.881917103688197,-0.311804782231162,-0.241522945769824,-0.197202659436654,-0.166666666666667,0.114982578397212,0.389052709178032
-0.755928946018455,-0.436435780471985,-0.308606699924184,-0.239045721866879,-0.195180014589707,-0.164957219768465,-0.142857142857143,0,0,0,0.966091783079296,-0.197202659436654,-0.166666666666667,0.240418118466899,0.173608551419093

Solution

  • If you change the 15 above to a 14 your code will work. The parameters for the function are:

    filename input columns ideal columns etc

    input columns + ideal columns = total columns in the file

    Because you are telling it that you have 15 inputs and 1 ideal, the function is expecting 16 total.

    The error message is somewhat bad. It makes a little more sense if you realize that the column indexes are zero based. So it is trying to read column index 15 (actual column 16), which does not exist in your file. I've added it to my list to revise that error message.

    Updated to address your 2nd question:

    You are getting an out of bounds error because you are trying to train a 25 input neuron network with a dataset that has 14 inputs. Modify your pattern line to this and it will work:

    let pattern = new FeedForwardPattern(InputNeurons = 14, OutputNeurons = 1, ActivationFunction = ActivationTANH())