Search code examples
c#machine-learningoutliersml.netanomaly-detection

ML.NET RandomizedPCA Trainer AUC not defined


First time learning how to use ML.NET Wanted to try out Anomaly Detection on a Database. I retrieve the following data:

    public string Title { get; set; }
    public string CertValidFrom { get; set; }
    public string CertValidTo { get; set; }
    public float Label { get; set; }

Label is needed for the RandomizedPCATrainer and is set to 0. I also featurized the text:

     IEstimator<ITransformer> dataProcessPipeline = mLContext.Transforms
         .Text.FeaturizeText("TitleF", "Title")
         .Append(mLContext.Transforms.Text.FeaturizeText("CertValidFromF", "CertValidFrom"))
         .Append(mLContext.Transforms.Text.FeaturizeText("CertValidToF", "CertValidTo"))
         .Append(mLContext.Transforms.Concatenate("Features", "TitleF", "CertValidFromF", "CertValidToF"));

Then I used the following options:

      var options = new RandomizedPcaTrainer.Options {
          FeatureColumnName = "Features",
          ExampleWeightColumnName = null,
          Rank = 28,
          Oversampling = 20,
          EnsureZeroMean = true,
          Seed = 1};

But when evaluating the model with the testdata I get the following error:

An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in Microsoft.ML.Core.dll: 'AUC is not defined when there is no positive class in the data'

I don't have a lot of experience so my question are:

  • What does this error say?
  • Is the dataProcessPipeline correct and what does it do?

Solution

  • Oke so long story short, this algorithm did not fit my dataset, so I tried some different ones. So be aware to always research the uses cases of the algorithm you choose.