Search code examples
c#ml.netautoml

ML.Net AutoML is getting null reference exception even though I have data?


I am learning ML.Net and trying to use the AutoML API and getting a null reference exception. Question has been updated with my recent learnings and a minimal amount of code to reproduce.

Put this in VSCode and you too can experience a 2 dimension vector exploding.

class Program
{
    static void Main(string[] args)
    {
        var mlContext = new MLContext();

        // create schema for multidimensional vector
        var autoSchema = SchemaDefinition.Create(typeof(InputData));
        var col = autoSchema[1];
        col.ColumnType = new VectorDataViewType(NumberDataViewType.Single, 3, 60);

        // fabricate some data
        var trainingData = new List<InputData>();
        var inputData = new InputData();
        inputData.MultiDimensional = new float[20,20];
        for (int i = 0; i < inputData.MultiDimensional.GetUpperBound(0); i++)
        {
            for (int j = 0; j < inputData.MultiDimensional.GetUpperBound(1); j++)
            {
                inputData.MultiDimensional[i,j] = 5; // doesn't matter
            }
        }
        trainingData.Add(inputData);

        // setup a data view
        IDataView trainingDataView = mlContext.Data.LoadFromEnumerable<InputData>(trainingData, autoSchema);

        // preview it (goes BOOM)
        var preview = trainingDataView.Preview();

        // run the experiment
        var settings = new BinaryExperimentSettings();
        settings.MaxExperimentTimeInSeconds = 60;
        ExperimentResult<BinaryClassificationMetrics> experimentResult = mlContext.Auto()
            .CreateBinaryClassificationExperiment(settings)
            .Execute(trainingDataView);
    }
}

public class InputData
{
    public bool Label { get; set; }
    public float[,] MultiDimensional { get; set; }
}

The documentation seems to indicate my setup is correct: https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.data.vectortypeattribute.-ctor?view=ml-dotnet#Microsoft_ML_Data_VectorTypeAttribute__ctor_System_Int32___

To fix my multidimension vector problem, I've tried:

  • Removing the float[,] initializers in InputData
  • Specifying the exact size with [VectorType(3,60)] as appropriate for each property
  • Leaving the [VectorType] attribute off altogether and using autoschema to set it.
  • Leaving the [VectorType] attribute off altogether and not using autoschema to let ML.net figure it out on its own
  • Adding just [VectorType()], although the docs say that is for single dimension arrays.

My question now is - what is the correct way to use vectors with more than 1 dimension in the AutoML part of ML.Net? Is this even possible?


Solution

  • Oh wow, found my own question years later still open. From the github issue posted in the comments, this still is not possible. This github issue in 2022 confirms this is still the case.