Search code examples
c#machine-learningml.net

Train a model without labeling the features in ML.NET


I would like to train a model, with a large list of features, these features are if a specific keyword appears on a page or not. The feature list is so large that I cannot label all of them like suggested in the ML.NET tutorial here.

public class IrisData
{
    [LoadColumn(0)]
    public float SepalLength;

    [LoadColumn(1)]
    public float SepalWidth;

    [LoadColumn(2)]
    public float PetalLength;

    [LoadColumn(3)]
    public float PetalWidth;

    [LoadColumn(4)]
    public string Label;
}

I would instead like to be able to give it a list of unnamed features, much like you can do in sklearn with python simply giving it an array of features [[0,0,1],[0,1,0]] and an array of labels ["ShoppingSite", "SocialNetwork"].


Solution

  • Are all your features of the same type, booleans? If so you can load all the features into a single columns using TextLoader.Range(startIndex, EndIndex): https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md#how-do-i-load-data-with-many-columns-from-a-csv

    var reader = mlContext.Data.CreateTextReader(new[] {
            // We read the first 10 values as a single float vector.
            new TextLoader.Column("FeatureVector", DataKind.R4, new[] {new TextLoader.Range(0, 10)}),
            // Separately, read the target variable.
            new TextLoader.Column("Target", DataKind.R4, 11)
        },
        // Default separator is tab, but we need a comma.
        separatorChar: ',');