I want to train binary classificator. I upgraded ML.NET 0.9
to ML.NET 1.4
. Now my code looks like this:
var mlContext = new MLContext();
var trainData = mlContext.Data.LoadFromTextFile<CancerData>("Cancer-train.csv", hasHeader: true, separatorChar: ';');
var pipeline = mlContext.Transforms.NormalizeMinMax("Features")
.AppendCacheCheckpoint(mlContext)
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Target", featureColumnName: "Features"));
var model = pipeline.Fit(trainData);
My test data looks like this:
B;11.49;14.59;73.99;404.9;0.1046;0.08228;0.05308;0.01969;0.1779;0.06574;0.2034;1.166;1.567;14.34;0.004957;0.02114;0.04156;0.008038;0.01843;0.003614;12.4;21.9;82.04;467.6;0.1352;0.201;0.2596;0.07431;0.2941;0.0918
M;16.25;19.51;109.8;815.8;0.1026;0.1893;0.2236;0.09194;0.2151;0.06578;0.3147;0.9857;3.07;33.12;0.009197;0.0547;0.08079;0.02215;0.02773;0.006355;17.39;23.05;122.1;939.7;0.1377;0.4462;0.5897;0.1775;0.3318;0.09136
B;12.16;18.03;78.29;455.3;0.09087;0.07838;0.02916;0.01527;0.1464;0.06284;0.2194;1.19;1.678;16.26;0.004911;0.01666;0.01397;0.005161;0.01454;0.001858;13.34;27.87;88.83;547.4;0.1208;0.2279;0.162;0.0569;0.2406;0.07729
And CancerData
class:
class CancerData
{
[LoadColumn(1, 30), ColumnName("Features")]
public float[] FeatureVector { get; set; }
[LoadColumn(31)]
public float Target { get; set; }
}
From code above, i get error:
System.ArgumentOutOfRangeException: 'Schema mismatch for label column '': expected Boolean, got Single Arg_ParamName_Name'
I belive its because i dont have true/false
value in my first column but B/M
. How to convert this values in elegant way to true/false
value that trainer can fit without exception? Is ML.NET
providing solutions for this scenerios? Or maybe im wrong and there is something wrong with my code?
First of all, your target column is not the 31st, but the 0th, right?
I would just read it as text, and then transform to bool using MapValue
:
class CancerData
{
[LoadColumn(1, 30), ColumnName("Features")]
public float[] FeatureVector { get; set; }
[LoadColumn(0)]
public string Target { get; set; }
}
// ...
var trainData = mlContext.Data.LoadFromTextFile<CancerData>("Cancer-train.csv", hasHeader: true, separatorChar: ';');
var targetMap = new Dictionary<string, bool> { { "M", true }, { "B", false } };
var pipeline = mlContext.Transforms.Conversion.MapValue("Target", targetMap)
.Append(mlContext.Transforms.NormalizeMinMax("Features"))
.AppendCacheCheckpoint(mlContext)
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Target", featureColumnName: "Features"));