Search code examples
csvneural-networkclassificationencog

How Do I Normalize CSV Input Data in Encog?


I have successfully made a neural network using Jeff Heaton's Encog library. I am currently using it to classify (Iris Plants).

The problem I now have is as follows:

I have a dataset CSV file which contains the ideal output and I use this for training. I wish to use a separate CSV file which doesn't contain an output field for recognition. The problem I have is that when I use the new CSV without an output field I get the following error when trying to normalize my file:

The Error:

"Can't determine target field automatically, please specify one.

This can also happen if you specified the wrong file format."

This is the method:

        public void NormalizeFile(FileInfo SourceDataFile, FileInfo NormalizedDataFile)
    {
        var wizard = new AnalystWizard(_analyst);

        // This line errors
        wizard.Wizard(SourceDataFile, _useHeaders, AnalystFileFormat.DecpntComma); 

        var norm = new AnalystNormalizeCSV();
        norm.Analyze(SourceDataFile, _useHeaders, CSVFormat.English, _analyst);
        norm.ProduceOutputHeaders = _useHeaders;
        norm.Normalize(NormalizedDataFile);
    }

When I am doing my training which involves normalizing the training data. I save the normalization data. I then reload this normalization data when recognizing.

If I keep the output column in the data, that I'm recognizing, then it works! What about cases of new data when the classification is unknown?

For example when use the following format as a file to be recognized:

sepal_l, sepal_w, petal_l, petal_w, name

then it adds another column with the predicted output like this:

sepal_l, sepal_w, petal_l, petal_w, name, prediction,

however I want to be able to enter files without the name column.

Many Thanks, Kiran


Solution

  • You can accomplish your task in multiple ways. However, the typical flow would be :

    take your data --> normalize it using the normalization information stored in encog analyst ---> create an input array of normalized inputs --> pass it to the trained network and compute output (or predict class in the classification problem)

    I have updated the Iris demo (evaluation phase) for this scenario (which I covered in Pluralsight Course:

    Here is the portion of the code :

        // Evaluating a new data set with no class information
             var extraEvaluationSet = EncogUtility.LoadCSV2Memory(Config.ExtraEvaluationFile.ToString(),
               network.InputCount, 0, true, CSVFormat.English, false);
    
             int extraFileCount = 0;
             using (var file = new System.IO.StreamWriter(Config.ExtraEvaluationFileOutput.ToString()))
             {
                 file.WriteLine("sepal_l,sepal_w,petal_l,petal_w,predicted");
                 foreach (var item in extraEvaluationSet)
                 {
                     //normalize input
                     double normalized_sepal_l = analyst.Script.Normalize.NormalizedFields[0].Normalize(item.Input[0]);
                     double normalized_sepal_w = analyst.Script.Normalize.NormalizedFields[1].Normalize(item.Input[1]);
                     double normalized_petal_l = analyst.Script.Normalize.NormalizedFields[2].Normalize(item.Input[2]);
                     double normalized_petal_w = analyst.Script.Normalize.NormalizedFields[3].Normalize(item.Input[3]);
                     double[] inputToNetwork = { normalized_sepal_l, normalized_sepal_w, normalized_petal_l, normalized_petal_w };
    
                     count++;
                     //output
                     var output = network.Compute(new BasicMLData(inputToNetwork));
    
                     int classCount = analyst.Script.Normalize.NormalizedFields[4].Classes.Count;
                     double normalizationHigh = analyst.Script.Normalize.NormalizedFields[4].NormalizedHigh;
                     double normalizationLow = analyst.Script.Normalize.NormalizedFields[4].NormalizedLow;
    
                     var eq = new Encog.MathUtil.Equilateral(classCount, normalizationHigh, normalizationLow);
                     var predictedClassInt = eq.Decode(output);
                     var predictedClass = analyst.Script.Normalize.NormalizedFields[4].Classes[predictedClassInt].Name;
                     var resultLine = string.Format("{0},{1},{2},{3},{4}", item.Input[0], item.Input[1], item.Input[2], item.Input[3],predictedClass);
                     file.WriteLine(resultLine);
                     Console.WriteLine("Count :{0} Properties [{1},{2},{3},{4}] ,Predicted : {5} ",
                          extraFileCount, item.Input[0], item.Input[1], item.Input[2], item.Input[3], predictedClass);
    
                 }
    
             }
    

    The demo code is available on the following link : http://bit.ly/1GRg0u7 (please edit the data folder path before executing)