I am trying to create an FAQ bot with ML.NET (cannot use QNA Maker). I want to compare the questions in the FAQ KB to an input and then return the most relevant answer. Most FAQ bots I found online worked like this: featurize the FAQ questions, featurize the input, do a cosine similarity, and then return the most relevant answer. I don't really understand Microsoft's featurization but I can't even test it because I can't find how to relate the feature vector to the original text.
This is what I have so far (in Main):
mlContext = new MLContext(seed: 0);
IDataView dataview = mlContext.Data.LoadFromTextFile<SampleData>("Data/training_data.tsv", hasHeader: true);
var textPipeline = mlContext.Transforms.Text.FeaturizeText("Features", "Question");
var textTransformer = textPipeline.Fit(dataview);
var predictionEngine = mlContext.Model.CreatePredictionEngine<SampleData, TransformedTextData>(textTransformer);
SampleData sampleData = new SampleData()
{
Question = "Setting Up Data Exchange" //would be changed to user input
};
var prediction = predictionEngine.Predict(sampleData);
Console.WriteLine($"Number of Features: {prediction.Features.Length}");
Console.Write("Features: ");
for (int i = 0; i < 1000; i++)
Console.Write($"{prediction.Features[i]:F4} ");
SampleData class:
public class SampleData
{
[LoadColumn(0)]
public string Question { get; set; }
[LoadColumn(1)]
public string Answer { get; set; }
}
public class TransformedTextData : SampleData
{
public float[] Features { get; set; }
}
It returns the feature vector but almost all of the values are zero so hopefully that's normal, but I just don't know how I can turn this into readable output. Also I don't understand why I can't just featurize and model just the FAQ text, why do I need a sample question, I feel like that's inefficient and probably I'm not going about it right. Thanks for any help!
I don't think ML.NET can actually do what I wanted, turns out just modifying this tutorial to what I wanted worked well enough.
Basically they can't just featurize a section of text but the text must be featurized in context to being trained.