Search code examples
c#ml.net

ML.NET v1.4, expected Boolean, got Single exception


I want to train binary classificator. I upgraded ML.NET 0.9 to ML.NET 1.4. Now my code looks like this:

var mlContext = new MLContext();
var trainData = mlContext.Data.LoadFromTextFile<CancerData>("Cancer-train.csv", hasHeader: true, separatorChar: ';');
var pipeline = mlContext.Transforms.NormalizeMinMax("Features")
    .AppendCacheCheckpoint(mlContext)
    .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Target", featureColumnName: "Features"));

var model = pipeline.Fit(trainData);

My test data looks like this:

B;11.49;14.59;73.99;404.9;0.1046;0.08228;0.05308;0.01969;0.1779;0.06574;0.2034;1.166;1.567;14.34;0.004957;0.02114;0.04156;0.008038;0.01843;0.003614;12.4;21.9;82.04;467.6;0.1352;0.201;0.2596;0.07431;0.2941;0.0918
M;16.25;19.51;109.8;815.8;0.1026;0.1893;0.2236;0.09194;0.2151;0.06578;0.3147;0.9857;3.07;33.12;0.009197;0.0547;0.08079;0.02215;0.02773;0.006355;17.39;23.05;122.1;939.7;0.1377;0.4462;0.5897;0.1775;0.3318;0.09136
B;12.16;18.03;78.29;455.3;0.09087;0.07838;0.02916;0.01527;0.1464;0.06284;0.2194;1.19;1.678;16.26;0.004911;0.01666;0.01397;0.005161;0.01454;0.001858;13.34;27.87;88.83;547.4;0.1208;0.2279;0.162;0.0569;0.2406;0.07729

And CancerData class:

class CancerData
{
    [LoadColumn(1, 30), ColumnName("Features")]
    public float[] FeatureVector { get; set; }

    [LoadColumn(31)]
    public float Target { get; set; }
}

From code above, i get error:

System.ArgumentOutOfRangeException: 'Schema mismatch for label column '': expected Boolean, got Single Arg_ParamName_Name'

I belive its because i dont have true/false value in my first column but B/M. How to convert this values in elegant way to true/false value that trainer can fit without exception? Is ML.NET providing solutions for this scenerios? Or maybe im wrong and there is something wrong with my code?


Solution

  • First of all, your target column is not the 31st, but the 0th, right?

    I would just read it as text, and then transform to bool using MapValue:

    class CancerData
    {
        [LoadColumn(1, 30), ColumnName("Features")]
        public float[] FeatureVector { get; set; }
    
        [LoadColumn(0)]
        public string Target { get; set; }
    }
    
    // ...
    
    var trainData = mlContext.Data.LoadFromTextFile<CancerData>("Cancer-train.csv", hasHeader: true, separatorChar: ';');
    
    var targetMap = new Dictionary<string, bool> { { "M", true }, { "B", false } };
    
    var pipeline = mlContext.Transforms.Conversion.MapValue("Target", targetMap)
        .Append(mlContext.Transforms.NormalizeMinMax("Features"))
        .AppendCacheCheckpoint(mlContext)
        .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Target", featureColumnName: "Features"));