Search code examples
c#csvencog

Encog C#, VersatileMLDataSet from CSV, how to get original data?


I want to use CSV reader from Encog library, like this:

    var format = new CSVFormat('.', ' ');
    IVersatileDataSource source = new CSVDataSource(filename, false, format);
    var data = new VersatileMLDataSet(source);

Is it possible to get original data from variable data? I have to show records from CSV to user in dataGridView, before I use it for neural network. I want to be able to modify original data as well. According documentation there is property Data, but it doesnt work for me. If I try something like:

data.Data[1][1] 

I get null pointer exception. There is another problem with using data before normalization. I want to get count of records by:

data.GetRecordCount()

But I get error You must normalize the dataset before using it. So even if I have not used data yet I have to normalize it? If this is true, then is probably better to use my own CSV reader and then load it into encog from memory, right?


Solution

  • So I just looked at the Encog source code on GitHub. Thankfully your question is well defined and narrow in scope, so I can provide an answer. Unfortunately, you probably won't like it.

    Basically, when you pass in your IVersatileDataSource into the constructor for VersatileMLDataSet, it gets placed into a private readonly field called _source. There is no abstraction around _source, so you cannot access it from outside of VersatileMLDataSet.

    The Data property indeed will only be populated during the normalization process. There also doesn't appear to be any fields within CSVDataSource that are public of any value to you (again, all private).

    If you just wanted to look at a single column of data, you could stay within Encog and look at Encog.Util.NetworkUtil.QuickCSVUtils. There are methods within this class that will help you pickup a file and get a single column of data out quickly.

    If you wanted to get the full CSV data out of a file within Encog, you could use the Encog.Util.CSV.ReadCSV class to get the data. This is the underlying implementation anyways utilized by your code when you instantiate a QuickCSVUtils. You will have to provide some wrapper logic around ReadCSV, similar to QuickCSVUtils. If you go this route, I'd recommend peeking in that class to see see how its using ReadCSV. Essentially ReadCSV reads a single line at time.

    But if you really need to read the RAW csv data from within the VersatileMLDataSet class, your best bet would be to provide your own implementation inside a custom class derived from VersatileMLDataSet.