Search code examples
c#asp.net.netmachine-learningml.net

C# ML.NET ProjectToPrincipalComponents - System.ArgumentOutOfRangeException: 'Schema mismatch for input column'


I am trying to transform a dataview by calculating a PCA using the method ProjectToPrincipalComponents.

Each object is defined as following:

public class thisItem
    {
    public int itemName { get; set; }
    [ColumnName("Prices")]
    public double[] Prices { get; set; }
    }

Then I have a list:

List<thisItem> allItems = new();

I want to read this list in ML.NET:

var mlContext = new MLContext();
int numberOfFeatures = allItems.FirstOrDefault().Prices.Count();
SchemaDefinition schemaDef = SchemaDefinition.Create(typeof(thisItem)), SchemaDefinition.Direction.Both);
PrimitiveDataViewType itemType = ((VectorDataViewType)schemaDef["Prices"].ColumnType).ItemType;
schemaDef["Prices"].ColumnType = new VectorDataViewType(itemType, numberOfFeatures);

IDataView dataView = mlContext.Data.LoadFromEnumerable(allItems, schemaDef);

Microsoft.ML.Transforms.PrincipalComponentAnalyzer pipeline = mlContext.Transforms.ProjectToPrincipalComponents
    (outputColumnName: "Prices", inputColumnName: "Prices", rank: 10, seed: 1);

ITransformer datatransf = pipeline.Fit(dataView);

As soon as it runs the last line, I get the error: System.ArgumentOutOfRangeException: 'Schema mismatch for input column 'Prices': expected known-size vector of Single of two or more items, got Vector<Double, 17> (Parameter 'inputSchema')'

What could be wrong? I've been on this for hours, read all documentation and all github examples I found are out of date.


Solution

  • So in case anyone runs into this problem, it took me a full week but the solution is actually extremely simple: the type of the array must be float[] instead of double[].

    Only one line must change:

    public double[] Prices { get; set; }
    

    should be replaced by

    public float[] Prices { get; set; }
    

    That's all.