Search code examples
c#.netmachine-learningone-hot-encodingml.net

How to test predictions after applying one-hot encoding during training using ml.net?


Below is the following snippet to do data processing

var pipeline = _mlContext.Transforms.Conversion.ConvertType(new[] {
    new InputOutputColumnPair("x1", "x"),
    new InputOutputColumnPair("a1", "a"),
}, 
outputKind: DataKind.Single).Append(_mlContext.Transforms.Categorical.OneHotEncoding(new[] {
    new InputOutputColumnPair("b1","b"),
    new InputOutputColumnPair("c1","c")
})).Append(_mlContext.Transforms.SelectColumns("x1", "a1", "b1", "c1","Label"));
   
            data = pipeline.Fit(data).Transform(data);
    
            // Split the data into a training set and a test set
            split = _mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
    
            // Define the target column name
            labelColumnName = nameof(Dataset.Label);
            string[] featureColumnNames = data.Schema.AsQueryable()
               .Select(column => column.Name)                               
            // Get alll the column names
               .Where(name => name != nameof(Dataset.Label)) // Do not include the Label column
               .ToArray();
    
            // Create the data process pipeline
            var dataProcessPipeline = _mlContext.Transforms.Concatenate("Features", featureColumnNames)
                                            .Append(_mlContext.Transforms.NormalizeMeanVariance(inputColumnName: "Features", outputColumnName: "FeaturesNormalizedByMeanVar"));

When I try to do a prediction like this:

public List<PredictionEngineOutput> Predict(string path, List<DataToPredict> dataPredict)
{
    var model = _mlContext.Model.Load(path, out var schema);
    // Create a prediction engine
    var engine = _mlContext.Model.CreatePredictionEngine<DataToPredict, PredictionEngineSchema>(model);

            List<PredictionEngineSchema> predictionList = new List<PredictionEngineSchema>();
            var predictOutput = new List<PredictionEngineOutput>();
            foreach (var data in dataPredict)
            {
                // Make the prediction
                var prediction = engine.Predict(data);
                predictionList.Add(prediction);
    
                var p = new PredictionEngineOutput
                {
                    PredictedLabel = prediction.PredictedLabel,
                    Probability = prediction.Probability,
                    Score = prediction.Score,
                    FeatureContributions = new List<float>()
                };
                predictOutput.Add(p);
                foreach (var contribution in prediction.FeatureContributions.DenseValues())
                {
                    p.FeatureContributions.Add(contribution);
                }
            }
            return predictOutput;
        }

An error comes up on this line: var prediction = engine.Predict(data);

System.InvalidOperationException: Operation is not valid due to the current state of the object.

When predicting I tried to do apply same transformers to the List<DataToPredict> dataPredict after parsing it as a dataview.


Solution

  • I'm not sure why that issue is happening but if you're looking to predict on multiple rows / data instances, I would recommend using the Transform method. So it would look something like this:

    var dataToPredictDataView = mlContext.Data.LoadFromEnumerable(dataPredict);
    var predictions = model.Transform(dataToPredictDataView);
    

    For more information, check out this how-to guide on making predictions.