Search code examples
c#.netmachine-learningml.net

Batch Predictions in ML.NET with a MultiClass Classification Algorithm


I am trying to applying batch predictions as shown within the Binary Classification, example to a Multi Class Classification

Many of the examples demonstrated across google and Microsoft work, show a single prediction to demonstrate the capability of a trained model.

However, I am looking to demonstrate the effectiveness of a trained model through batch predicting a) multi manual entries and b) using a file with no labels.

I have followed the Binary Classification example and trying to map it across to the Multi Class Classification example, however, the predictions are not showing.

The following is a single prediction as shown in the Multi Class Classification example

// Single Prediction

ITransformer loadedModel = _mlContext.Model.Load(_modelPath, out var modelInputSchema);

GitHubIssue singleIssue = new GitHubIssue() { Title = "Entity Framework crashes", Description = "When connecting to the database, EF is crashing" }; // Our single issue
_predEngine = _mlContext.Model.CreatePredictionEngine<GitHubIssue, IssuePrediction>(loadedModel);
var singleprediction = _predEngine.Predict(singleIssue);
Console.WriteLine($"=============== Single Prediction - Result: {singleprediction.Area} ===============");

The following is batch prediction as shown in the Binary Classification example, however, does not work in the scenario.

// Batch Predictions from Enumerable
ITransformer loadedModel = _mlContext.Model.Load(_modelPath, out var modelInputSchema);
IDataView batchIssues = _mlContext.Data.LoadFromEnumerable(issues);
IDataView predictions = loadedModel.Transform(batchIssues);

IEnumerable<GitHubIssue> predictedResults = _mlContext.Data.CreateEnumerable<GitHubIssue>(predictions, reuseRowObject: false);

foreach (GitHubIssue prediction in predictedResults)
{
Console.WriteLine($"Title: {prediction.Title} | Prediction: {prediction.Area}");
}

Following the single prediction I get the result of:

area-System.Data

While the batch processing does not predict an output, as I don't think it is predicting, however, reading the material found on Microsofts ML.NET website it says transform will use the model to make the predictions across the batch of data.

"Use the model to predict the comment data sentiment using the Transform() method"

Problem 1) I am unsure, what I am missing within the batch prediction enumerable to get the predictions of prediction.area as shown in the single value.

Problem 2) How I would adapt the enumerable to load in a file of unlabelled information to make predictions on.


Solution

  • Please note: all code within this is placed within the PredictIssues method of the MultiClass Classification example.

    Answer to Problem 1

    Adapting the code to utilise the predEngine and then using the predict function on the single prediction within the foreach worked. Two changes were required to do this:

    var batchPrediction = _predEngine;
    
    Console.WriteLine($"Prediction: {batchPrediction.Predict(prediction).Area}"); 
    

    In addition, I removed the following line:

    IDataView predictions = loadedModel.Transform(batchIssues);
    

    The removal of this function made no difference to the outcome of prediction. The full code that seemed to have worked is as follows:

    IEnumerable<GitHubIssue> issues = new[]
    {
        new GitHubIssue
        {
             Title = "Entity Framework crashes",
             Description = "When connecting to the database, EF is crashing"
        },
        new GitHubIssue
        {
             Title = "Github Down",
             Description = "When going to the website, github says it is down"
        }
    
    };
    
    var batchPrediction = _predEngine;
    
    // Batch Predictions from Enumerable
    IDataView batchIssues = _mlContext.Data.LoadFromEnumerable(issues);
    
    
    IEnumerable<GitHubIssue> predictedResults = _mlContext.Data.CreateEnumerable<GitHubIssue>(batchIssues, reuseRowObject: false);
    
    foreach (GitHubIssue prediction in predictedResults)
    {
            Console.WriteLine($"Title: {prediction.Title} | Prediction: {batchPrediction.Predict(prediction).Area}");
    }
    

    Answer to Problem 2

    I created a new file with the ID, Area (this is left blank), Title and Description and mirrors the test and training data files.

    I added two variables to the global scope (just below the namespace) as follows:

    private static string _myTestDataPath => Path.Combine(_appPath, "..", "..", "..", "Data", "myTestData.tsv");
    private static IDataView _myTestDataView;
    

    Instead of creating a IEnumerable, I passed the file in directly, as follows:

    _myTestDataView = _mlContext.Data.LoadFromTextFile<GitHubIssue>(_myTestDataPath, hasHeader: true);
    

    The following is a full example of the method:

    ITransformer loadedModel = _mlContext.Model.Load(_modelPath, out var modelInputSchema);
    _predEngine = _mlContext.Model.CreatePredictionEngine<GitHubIssue, IssuePrediction>(loadedModel);
    _myTestDataView = _mlContext.Data.LoadFromTextFile<GitHubIssue>(_myTestDataPath, hasHeader: true);
    IDataView predictions = loadedModel.Transform(_myTestDataView);
    var batchPrediction = _predEngine;
    IEnumerable<GitHubIssue> predictedResults =
        _mlContext.Data.CreateEnumerable<GitHubIssue>(predictions, reuseRowObject: false);
    
    foreach (GitHubIssue prediction in predictedResults)
    {
        Console.WriteLine($"Title: {prediction.Title} | Prediction: {batchPrediction.Predict(prediction).Area}");
    }
    

    As a side note, comparing the batchPredictions to singlePredictions you must include the .Area to the end of the output as shown on the last line

    // Manual Batch Predictions
    Console.WriteLine($"Title: {prediction.Title} | Prediction: {batchPrediction.Predict(prediction).Area}");
    
    // File-based Batch Predictions
    Console.WriteLine($"Title: {prediction.Title} | Prediction: {batchPrediction.Predict(prediction).Area}");