Search code examples
c#machine-learning.net-coreml.net

How to define usable input data?


I am collecting data which is stored in a few db tables. I plan to use this data to regularly train a model, however the data is more complicated than any guide or tutorial I've seen and I seem to lack the googling ability to find any example's of similar data.

In essence I have about 1500 sets of data in the following structure

public class Tracker()
{
    public string Name {get;set;} // not important
    public string[] Categories {get;set;} // may have some effect on patterns that metrics display
    public List<Metric> Metrics {get;set;} // historical data to train model with
} 
public class Metric()
{
    public DateTime Date {get;set;}
    public long Value {get;set;}
}

How should I go abouts mapping that object to an input model/DataView? Should I look into the concatenation method?

I've seen some examples where the input model has been float arrays with VectorType attributes but I cant fathom how to use this or if its even the right path to go down.


Solution

  • I ended up flattening my input model and iterating through each tracker and metric to create a large data set of:

    public record InputModel
    {
        public string[] Categories;
        public float Value;
        public float Timestamp;
    }
    

    and to create the estimator i did the following:

    var estimator = _mlContext.Transforms.CopyColumns("Label", "Value")
                        .Append(_mlContext.Transforms.Concatenate("NumFeatures", "Value", "Timestamp"))
                        .Append(_mlContext.Transforms.Categorical.OneHotEncoding("CatFeatures", "Categories"))
                        .Append(_mlContext.Transforms.Concatenate("Features", "NumFeatures", "CatFeatures"))
                        .Append(_mlContext.Regression.Trainers.Sdca());
    

    Seems to be working so far be it the data is a bit chaotic