Search code examples
c#nlpbotsml.net

How to turn featurized text back to actual text in ML.NET (for chatbot)?


I am trying to create an FAQ bot with ML.NET (cannot use QNA Maker). I want to compare the questions in the FAQ KB to an input and then return the most relevant answer. Most FAQ bots I found online worked like this: featurize the FAQ questions, featurize the input, do a cosine similarity, and then return the most relevant answer. I don't really understand Microsoft's featurization but I can't even test it because I can't find how to relate the feature vector to the original text.

This is what I have so far (in Main):

mlContext = new MLContext(seed: 0);
IDataView dataview = mlContext.Data.LoadFromTextFile<SampleData>("Data/training_data.tsv", hasHeader: true);
var textPipeline = mlContext.Transforms.Text.FeaturizeText("Features", "Question");
var textTransformer = textPipeline.Fit(dataview);
var predictionEngine = mlContext.Model.CreatePredictionEngine<SampleData, TransformedTextData>(textTransformer);
SampleData sampleData = new SampleData()
    {
        Question = "Setting Up Data Exchange" //would be changed to user input
    };
var prediction = predictionEngine.Predict(sampleData);
Console.WriteLine($"Number of Features: {prediction.Features.Length}");
Console.Write("Features: ");
    for (int i = 0; i < 1000; i++)
        Console.Write($"{prediction.Features[i]:F4}  ");

SampleData class:

public class SampleData
{
        [LoadColumn(0)]
        public string Question { get; set; }

        [LoadColumn(1)]
        public string Answer { get; set; }
}

public class TransformedTextData : SampleData
{
        public float[] Features { get; set; }
}

It returns the feature vector but almost all of the values are zero so hopefully that's normal, but I just don't know how I can turn this into readable output. Also I don't understand why I can't just featurize and model just the FAQ text, why do I need a sample question, I feel like that's inefficient and probably I'm not going about it right. Thanks for any help!


Solution

  • I don't think ML.NET can actually do what I wanted, turns out just modifying this tutorial to what I wanted worked well enough.

    Basically they can't just featurize a section of text but the text must be featurized in context to being trained.