I have successfully made a neural network using Jeff Heaton's Encog library. I am currently using it to classify (Iris Plants).
The problem I now have is as follows:
I have a dataset CSV file which contains the ideal output and I use this for training. I wish to use a separate CSV file which doesn't contain an output field for recognition. The problem I have is that when I use the new CSV without an output field I get the following error when trying to normalize my file:
The Error:
"Can't determine target field automatically, please specify one.
This can also happen if you specified the wrong file format."
This is the method:
public void NormalizeFile(FileInfo SourceDataFile, FileInfo NormalizedDataFile)
{
var wizard = new AnalystWizard(_analyst);
// This line errors
wizard.Wizard(SourceDataFile, _useHeaders, AnalystFileFormat.DecpntComma);
var norm = new AnalystNormalizeCSV();
norm.Analyze(SourceDataFile, _useHeaders, CSVFormat.English, _analyst);
norm.ProduceOutputHeaders = _useHeaders;
norm.Normalize(NormalizedDataFile);
}
When I am doing my training which involves normalizing the training data. I save the normalization data. I then reload this normalization data when recognizing.
If I keep the output column in the data, that I'm recognizing, then it works! What about cases of new data when the classification is unknown?
For example when use the following format as a file to be recognized:
sepal_l, sepal_w, petal_l, petal_w, name
then it adds another column with the predicted output like this:
sepal_l, sepal_w, petal_l, petal_w, name, prediction,
however I want to be able to enter files without the name column.
Many Thanks, Kiran
You can accomplish your task in multiple ways. However, the typical flow would be :
take your data --> normalize it using the normalization information stored in encog analyst ---> create an input array of normalized inputs --> pass it to the trained network and compute output (or predict class in the classification problem)
I have updated the Iris demo (evaluation phase) for this scenario (which I covered in Pluralsight Course:
Here is the portion of the code :
// Evaluating a new data set with no class information
var extraEvaluationSet = EncogUtility.LoadCSV2Memory(Config.ExtraEvaluationFile.ToString(),
network.InputCount, 0, true, CSVFormat.English, false);
int extraFileCount = 0;
using (var file = new System.IO.StreamWriter(Config.ExtraEvaluationFileOutput.ToString()))
{
file.WriteLine("sepal_l,sepal_w,petal_l,petal_w,predicted");
foreach (var item in extraEvaluationSet)
{
//normalize input
double normalized_sepal_l = analyst.Script.Normalize.NormalizedFields[0].Normalize(item.Input[0]);
double normalized_sepal_w = analyst.Script.Normalize.NormalizedFields[1].Normalize(item.Input[1]);
double normalized_petal_l = analyst.Script.Normalize.NormalizedFields[2].Normalize(item.Input[2]);
double normalized_petal_w = analyst.Script.Normalize.NormalizedFields[3].Normalize(item.Input[3]);
double[] inputToNetwork = { normalized_sepal_l, normalized_sepal_w, normalized_petal_l, normalized_petal_w };
count++;
//output
var output = network.Compute(new BasicMLData(inputToNetwork));
int classCount = analyst.Script.Normalize.NormalizedFields[4].Classes.Count;
double normalizationHigh = analyst.Script.Normalize.NormalizedFields[4].NormalizedHigh;
double normalizationLow = analyst.Script.Normalize.NormalizedFields[4].NormalizedLow;
var eq = new Encog.MathUtil.Equilateral(classCount, normalizationHigh, normalizationLow);
var predictedClassInt = eq.Decode(output);
var predictedClass = analyst.Script.Normalize.NormalizedFields[4].Classes[predictedClassInt].Name;
var resultLine = string.Format("{0},{1},{2},{3},{4}", item.Input[0], item.Input[1], item.Input[2], item.Input[3],predictedClass);
file.WriteLine(resultLine);
Console.WriteLine("Count :{0} Properties [{1},{2},{3},{4}] ,Predicted : {5} ",
extraFileCount, item.Input[0], item.Input[1], item.Input[2], item.Input[3], predictedClass);
}
}
The demo code is available on the following link : http://bit.ly/1GRg0u7 (please edit the data folder path before executing)