I want to use the ML.Net Kmeans algo but I do not know during compile time the size of the dataset aka the number of features.
I see that the vector type length is supposed to be a const and thus trying to pass as an argument will not work.
class Data
{
public string ID{ get; set; }
[VectorType(5)] //I do not know the if the data will contain 5 or more features
public float[] Features { get; set; }
}
To be used:
InputData row = new InputData { AssetID = Data[0, i + 1].ToString(), Features = features };
var context = new MLContext();
var DataView = context.Data.LoadFromEnumerable(dataArray);
string featuresColumnName = "Features";
var pipeline=context.Transforms.Concatenate(featuresColumnName,"Features") .Append(context.Clustering.Trainers.KMeans(featuresColumnName, clustersCount: NumberClusters));
var model = pipeline.Fit(DataView);
If the dimension of the vector is fixed, you can work around at runtime:
private class SampleTemperatureDataVector
{
public DateTime Date { get; set; }
public float[] Temperature { get; set; }
}
notice this type has not annotations. You can create SchemaDefinition from it, than modify that schema. The initial SchemaDefinition will have the IsKnownSize
property set to false
. After the modification the Size
will be set to the dimension you set it, 3 in this case.
var data2 = new SampleTemperatureDataVector[]
{
new SampleTemperatureDataVector
{
Date = DateTime.UtcNow,
Temperature = new float[] {1.2f, 3.4f, 5.6f}
},
new SampleTemperatureDataVector
{
Date = DateTime.UtcNow,
Temperature = new float[] {1.2f, 3.4f, 5.6f}
},
};
int featureDimension = 3;
var autoSchema = SchemaDefinition.Create(typeof(SampleTemperatureDataVector));
var featureColumn = autoSchema[1];
var itemType = ((VectorDataViewType)featureColumn.ColumnType).ItemType;
featureColumn.ColumnType = new VectorDataViewType(itemType, featureDimension);
IDataView data3 = mlContext.Data.LoadFromEnumerable(data2, autoSchema);