Search code examples
phpphp-ml

I am facing problem in using PHP-ML Persistency feature. The model is not working as expected


I am building a model to analyze sentences whether positive or negative. After the model is successfully trained, I am trying to save the model using PHP-ML Persistency. The model is being saved successfully as well.

But the problem is when I restore the model from file and try to predict, the predict method is throwing error as follows :

Type: Phpml\Exception\InvalidArgumentExceptionMessage: Missing feature. All samples must have equal number of features.

My example code is as follows :

require_once __DIR__ . '/vendor/autoload.php';

use Phpml\Classification\NaiveBayes;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WhitespaceTokenizer;
use Phpml\ModelManager;

$modelPath = '/var/www/ai/models/sentiment_analyzer';
$modelManager = new ModelManager();
if( ! file_exists($modelPath) ) {

    // Training data
    $sentences = [
       'I love this product!',
       'This is terrible, do not buy it.',
       'The customer service was amazing.',
       'I am very disappointed with this purchase.',
       'The quality of this item is excellent.',
       'I would recommend this to anyone.',
       'This product is a waste of money.',
       'The shipping was fast and efficient.',
       'I regret buying this product.',
       'This is the best product I have ever used.'
    ];
    
    $labels = [
       'positive',
       'negative',
       'positive',
       'negative',
       'positive',
       'positive',
       'negative',
       'positive',
       'negative',
       'positive'
    ];

    // Vectorize the sentences
    $vectorizer = new TokenCountVectorizer(new WhitespaceTokenizer());
    $vectorizer->fit($sentences);
    $vectorizer->transform($sentences);

    // Train the Naive Bayes classifier
    $classifier = new NaiveBayes();
    $classifier->train($sentences, $labels);

    $modelManager->saveToFile($classifier, $modelPath);
}
else
{
    $classifier = $modelManager->restoreFromFile($modelPath);
}


// Predict the sentiment of a new sentence
$newSentence = 'This product is not worth the money.';
$newSentenceVector = $vectorizer->transform([$newSentence])[0];
$predictedLabel = $classifier->predict($newSentenceVector);

echo 'Prediction : ' . $predictedLabel;

Please do note, if I am not saving the model, the prediction is working fine without any errors.


Solution

  • I've been battling with something similar and I cracked it this morning. There are a couple of issues with your code, particulary: $vectorizer->transform([$newSentence])[0] , for example. This doesn't make sense since transform() doesn't actually return anything.

    But all of that aside. When you want to run a prediction, it seems, you must use the same TokenCountVectorizer instance that you used to train your model.

    The simplest method to do this, is rather than storing just your Classifier, in this case NaiveBayes, you need to store the entire pipeline.

    Here is the basic pseudocode to get there (not taken from, or relating to, your code)

    $pipeline = new Pipeline( [
        new TokenCountVectorizer( new WordTokenizer(), new English() ),
        new TfIdfTransformer()
    ], new NaiveBayes() );
    
    ... do whatever training you need to then save your pipeline ...
    
    $modelManager = new ModelManager();
    $modelManager->saveToFile( $pipeline, 'my_entire_pipeline.phpml' );
    
    ... a few moments later ...
    
    // When you need to predict, reload your pipeline:
    $modelManager = new ModelManager();
    $pipeline = $modelManager->restoreFromFile('my_entire_pipeline.phpml');
    

    I battled with this until grey matter started to ooze out of my ears, then woke up this morning deciding to try persisting the entire Pipeline instead of just the NaiveBayers classifier.

    This was the best reference to lay all of this out to help me get there: https://arkadiuszkondas.com/text-data-classification-with-bbc-news-article-dataset/