In Encog 3.x, how do you normalize data, use it for training, and denormalize results?
There is no good documentation on this and a simple example that applies each of these would go a long way in reducing the learning curve on Encog. I haven't figured it all out yet, but here are some resources.
(1) *How does Encog 3.0 Normalize?*
This code is ok for saving a new normalized csv. It is not clear here though how to take the AnalystNormalizeCSV and convert it to an MLDataSet to actually use it.
EncogAnalyst analyst = new EncogAnalyst();
AnalystWizard wizard = new AnalystWizard(analyst);
wizard.wizard(sourceFile, true, AnalystFileFormat.DECPNT_COMMA);
final AnalystNormalizeCSV norm = new AnalystNormalizeCSV();
norm.analyze(sourceFile, true, CSVFormat.ENGLISH, analyst);
norm.setOutputFormat(CSVFormat.ENGLISH);
norm.setProduceOutputHeaders(true);
norm.normalize(targetFile)
(2) *How do I normalize a CSV file with Encog (Java)*
This code is, again, ok for producing a normalized csv output. But it is unclear on how to take the normalized data and actually apply it. There is a method for setting the target as an MLData, but it assumes all columns are inputs and doesn't leave room for any ideals. Furthermore, both of these options are difficult to use when the file has headers and/or unused columns.
try {
File rawFile = new File(MYDIR, "iris.csv");
// download Iris data from UCI
if (rawFile.exists()) {
System.out.println("Data already downloaded to: " + rawFile.getPath());
} else {
System.out.println("Downloading iris data to: " + rawFile.getPath());
BotUtil.downloadPage(new URL("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), rawFile);
}
// define the format of the iris data
DataNormalization norm = new DataNormalization();
InputField inputSepalLength, inputSepalWidth, inputPetalLength, inputPetalWidth;
InputFieldCSVText inputClass;
norm.addInputField(inputSepalLength = new InputFieldCSV(true, rawFile, 0));
norm.addInputField(inputSepalWidth = new InputFieldCSV(true, rawFile, 1));
norm.addInputField(inputPetalLength = new InputFieldCSV(true, rawFile, 2));
norm.addInputField(inputPetalWidth = new InputFieldCSV(true, rawFile, 3));
norm.addInputField(inputClass = new InputFieldCSVText(true, rawFile, 4));
inputClass.addMapping("Iris-setosa");
inputClass.addMapping("Iris-versicolor");
inputClass.addMapping("Iris-virginica");
// define how we should normalize
norm.addOutputField(new OutputFieldRangeMapped(inputSepalLength, 0, 1));
norm.addOutputField(new OutputFieldRangeMapped(inputSepalWidth, 0, 1));
norm.addOutputField(new OutputFieldRangeMapped(inputPetalLength, 0, 1));
norm.addOutputField(new OutputFieldRangeMapped(inputPetalWidth, 0, 1));
norm.addOutputField(new OutputOneOf(inputClass, 1, 0));
// define where the output should go
File outputFile = new File(MYDIR, "iris_normalized.csv");
norm.setCSVFormat(CSVFormat.ENGLISH);
norm.setTarget(new NormalizationStorageCSV(CSVFormat.ENGLISH, outputFile));
// process
norm.setReport(new ConsoleStatusReportable());
norm.process();
System.out.println("Output written to: " + rawFile.getPath());
} catch (Exception ex) {
ex.printStackTrace();
}
(3) *Denormalizing*
I'm at a total loss for how to take all of this and denormalize according to the appropriate data-type's max's and min's.
Here are few resources ,where you can get more detailed information about normalization and denormalization using ENCOG framework.
These great e-books written by Jeff Heaton himself, 1. Programming Neural Networks with Encog3 in C#, 2nd Edition by Heaton, Jeff (Oct 2, 2011) 2.Introduction to Neural Networks for C#, 2nd Edition by Jeff Heaton (Oct 2, 2008) These are must have ebooks for ENCOG users.
You can also have a look at pluralsight course on "Introduction to Machine learning with ENCOG", this also includes few examples of normalization and denormalization.
Now regarding your queries :"It is not clear here though how to take the AnalystNormalizeCSV and convert it to an MLDataSet to actually use it."
well you can use AnalystNormalizeCSV to normalize your training file. And then you can use LoadCSV2Memory of EncogUtility class to load the normalized training file to get the ML DataSet. Here is a sample code in C#,
var trainingSet = EncogUtility.LoadCSV2Memory(Config.NormalizedTrainingFile.ToString(),
network.InputCount, network.OutputCount,true, CSVFormat.English,false);
it takes the normalized training file as first parameter, network input neuron count as second, network output neuron count as third, fourth parameter is boolean if you have header in your csv file, then you can mention the format as the fifth parameter, and sixth parameter is for significance.
so once you have this dataset in memory, you can use it for training. Similar approach can be taken in cross validation and evaluation step also.
Regarding denormalization, you can first persist the analyst file, and later you can use analyst file to denormalize individual columns also. For example :
var denormlizedOutput = analyst.Script.Normalize.NormalizedFields[index].DeNormalize(item.Input[index]);
Similar approach can be used in denormalizing fields to get class labels also. For example
var predictedClass = analyst.Script.Normalize.NormalizedFields[index].Classes[predictedClassInt].Name;