I am trying to transform a dataview by calculating a PCA using the method ProjectToPrincipalComponents.
Each object is defined as following:
public class thisItem
{
public int itemName { get; set; }
[ColumnName("Prices")]
public double[] Prices { get; set; }
}
Then I have a list:
List<thisItem> allItems = new();
I want to read this list in ML.NET:
var mlContext = new MLContext();
int numberOfFeatures = allItems.FirstOrDefault().Prices.Count();
SchemaDefinition schemaDef = SchemaDefinition.Create(typeof(thisItem)), SchemaDefinition.Direction.Both);
PrimitiveDataViewType itemType = ((VectorDataViewType)schemaDef["Prices"].ColumnType).ItemType;
schemaDef["Prices"].ColumnType = new VectorDataViewType(itemType, numberOfFeatures);
IDataView dataView = mlContext.Data.LoadFromEnumerable(allItems, schemaDef);
Microsoft.ML.Transforms.PrincipalComponentAnalyzer pipeline = mlContext.Transforms.ProjectToPrincipalComponents
(outputColumnName: "Prices", inputColumnName: "Prices", rank: 10, seed: 1);
ITransformer datatransf = pipeline.Fit(dataView);
As soon as it runs the last line, I get the error: System.ArgumentOutOfRangeException: 'Schema mismatch for input column 'Prices': expected known-size vector of Single of two or more items, got Vector<Double, 17> (Parameter 'inputSchema')'
What could be wrong? I've been on this for hours, read all documentation and all github examples I found are out of date.
So in case anyone runs into this problem, it took me a full week but the solution is actually extremely simple: the type of the array must be float[] instead of double[].
Only one line must change:
public double[] Prices { get; set; }
should be replaced by
public float[] Prices { get; set; }
That's all.