Search code examples
c#machine-learningml.net

How to load supervised data into MLContext object


My Situation I am attempting to create a neural network that classifies two types of signals (yes or no essentially) using ML.net. I have one set of data that maps to no and another that will map to yes. I hope to train the network with this data.

My Problem Since my training data is supervised (I know the desired output), how do I "tell" the LoadFromTextFile function that all that data should map to "yes" (or 1 it doesn't matter)

My Question In short, how do you train a network with supervised data(I know the desired output of my training data) in ML.Net?

My Data Model:

public class Analog
{
    [LoadColumn(0, Global.SAMPLE_SIZE - 1)]
    [VectorType(Global.SAMPLE_SIZE)]
    public float[] DiscreteSignal { get; set; }
}

Loading code:

//Create MLContext
static MLContext mCont = new MLContext();

//Load Data
IDataView data = mCont.Data.LoadFromTextFile<Analog>("myYesSignalData.csv", separatorChar: ',', hasHeader: false);

Solution

  • ML.NET has support for loading multiple datasets into one IDataView, by using the MultiFileSource class:

    var loader = mCont.Data.LoadFromTextFile<Analog>(separatorChar: ',', hasHeader: false);
    IDataView data = loader.Load(new MultiFileSource("myYesSignalData.csv", "myNoSignalData.csv"));
    

    However, I currently see no way to let the trainer know which examples are positive and which are negative other than to add a label column to both files: in the "yes" file add an all-ones column and in the "no" file add an all-zeros column. Then define the Analog class this way:

    public class Analog
    {
        [LoadColumn(0, Global.SAMPLE_SIZE - 1)]
        [VectorType(Global.SAMPLE_SIZE)]
        public float[] DiscreteSignal { get; set; }
    
        [LoadColumn(Global.SAMPLE_SIZE)]
        public float Label { get; set; }
    }
    

    Adding the label column can be done with a simple C# program, such as this:

    public class AnalogNoLabel
    {
        [LoadColumn(0, Global.SAMPLE_SIZE - 1)]
        [VectorType(Global.SAMPLE_SIZE)]
        public float[] DiscreteSignal { get; set; }
    }
    
    public void AddLabel(MLContext mCont)
    {
        IDataView data = mCont.Data.LoadFromTextFile<AnalogNoLabel>("myYesSignalData.csv", separatorChar: ',', hasHeader: false);
        var pipeline = mCont.Transforms.CustomMapping<AnalogNoLabel, Analog>((input, output) => {
            output.DiscreteSignal = input.DiscreteSignal;
            output.Label = 1;
        }, contractName: null);
        IDataView dataWithLabel = pipeline.Fit(data).Transform(data);
        using (var stream = new FileStream("myNewYesSignalData.txt", FileMode.Create))
            mCont.Data.SaveAsText(dataWithLabel, stream);
    }
    
    

    and a similar script for "myNoSignalData.csv" with output.Label = 0 instead of output.Label = 1.