Search code examples
c#.net-coreml.net

Training set has 0 instances, abort training exception


Im rebuilding my project to ML.NET 0.10. I get data from this link and its look like this (i saved it as .csv file in this way:

diagnosis;radius_mean;texture_mean;perimeter_mean;area_mean;smoothness_mean;compactness_mean;concavity_mean;concave points_mean;symmetry_mean;fractal_dimension_mean;radius_se;texture_se;perimeter_se;area_se;smoothness_se;compactness_se;concavity_se;concave points_se;symmetry_se;fractal_dimension_se;radius_worst;texture_worst;perimeter_worst;area_worst;smoothness_worst;compactness_worst;concavity_worst;concave points_worst;symmetry_worst;fractal_dimension_worst
B;11.62;18.18;76.38;408.8;0.1175;0.1483;0.102;0.05564;0.1957;0.07255;0.4101;1.74;3.027;27.85;0.01459;0.03206;0.04961;0.01841;0.01807;0.005217;13.36;25.4;88.14;528.1;0.178;0.2878;0.3186;0.1416;0.266;0.0927
B;9.667;18.49;61.49;289.1;0.08946;0.06258;0.02948;0.01514;0.2238;0.06413;0.3776;1.35;2.569;22.73;0.007501;0.01989;0.02714;0.009883;0.0196;0.003913;11.14;25.62;70.88;385.2;0.1234;0.1542;0.1277;0.0656;0.3174;0.08524

My Data class presents like this:

class CancerData
{
    [LoadColumn(0, 30), ColumnName("Features")]
    public float FeatureVector { get; set; }

    [LoadColumn(31)]
    public float Target { get; set; }
}

Now, my Program.cs file:

var mlContext = new MLContext();
var trainData = mlContext.Data.ReadFromTextFile<CancerData>("Cancer-train.csv", 
                             hasHeader: true, 
                             separatorChar: ';');

var pipeline = mlContext.Transforms
                        .Normalize("Features")
                        .AppendCacheCheckpoint(mlContext)
            .Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(labelColumn: "Target", featureColumn: "Features"));

var model = pipeline.Fit(trainData);

var testData = mlContext.Data.ReadFromTextFile<CancerData>("Cancer-test.csv", 
                             hasHeader: true, 
                             separatorChar: ';');

var metrics = mlContext.BinaryClassification.Evaluate(model.Transform(testData), label: "Target");

From this code, i get an exception that says:

System.InvalidOperationException: 'Training set has 0 instances, aborting training.'

enter image description here

My question is, is my code is correct? My .csv files are in project folder and it works with ML.NET 0.5. Thanks for any advices!


Solution

  • LoadColumn(0, 30) specifies that the data is loaded from columns 0 to 30, and yet FeatureVector is a single float. It should be a float[] at least.

    The first column though contains text data. It should be excluded from the FeatureVector array.

    CancerData should probably look like this :

    class CancerData
    {
        [LoadColumn(1, 30), ColumnName("Features")]
        public float[] FeatureVector { get; set; }
    
        [LoadColumn(31)]
        public float Target { get; set; }
    }
    

    If the diagnosis column is needed, it should be :

    class CancerData
    {
        [LoadColumn(0)]
        public string Diagnosis {get;set;}
    
        [LoadColumn(1, 30), ColumnName("Features")]
        public float[] FeatureVector { get; set; }
    
        [LoadColumn(31)]
        public float Target { get; set; }
    }