Search code examples
laravelreference

How to pass a parameter to Rubix ML TextNormalizer->transform() method? Arguments cannot be passed by reference


I've got the command below:

<?php

namespace App\Console\Commands;

use Illuminate\Console\Command;

use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Extractors\CSV;
use Rubix\ML\Transformers\TextNormalizer;
use Rubix\ML\Transformers\WordCountVectorizer;
use Rubix\ML\Transformers\TfIdfTransformer;
use Rubix\ML\Persisters\Filesystem;
use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\CrossValidation\Reports\MulticlassBreakdown;
use Rubix\ML\CrossValidation\Reports\ConfusionMatrix;

use Log;
class PredictorCommand extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'command:predictor';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Command description';

    /**
     * Execute the console command.
     *
     * @return int
     */
    public function handle()
    {
        $mlModel = new PredictorCommand();

        echo "Please enter your text: ";
        $inputText = readline();
        $prediction = $mlModel->predict($inputText);

        echo "Prediction: {$prediction}";
    }

    public function predict($inputText)
    {
        // Load the trained model
        $modelPath = '/var/www/gdpr/tests/model.rbx';
        $modelData = file_get_contents($modelPath);
        $estimator = unserialize($modelData);

        // Preprocess the input text
        $textNormalizer = new TextNormalizer();
        $wordCountVectorizer = new WordCountVectorizer(10000);
        $tfIdfTransformer = new TfIdfTransformer();

        $text = $inputText;
        $text = $textNormalizer->transform([$text]);
        $text = $wordCountVectorizer->fit($text)->transform($text);
        $text = $tfIdfTransformer->fit($text)->transform($text);

        // Make a prediction
        $prediction = $estimator->predictSample($text[0]);

        return $prediction;
    }

}

When running it I'm getting the following error:

Please enter your text: sd

   Error 

  Rubix\ML\Transformers\TextNormalizer::transform(): Argument #1 ($samples) cannot be passed by reference

  at app/Console/Commands/PredictorCommand.php:63
     59▕         $wordCountVectorizer = new WordCountVectorizer(10000);
     60▕         $tfIdfTransformer = new TfIdfTransformer();
     61▕ 
     62▕         $text = $inputText;
  ➜  63▕         $text = $textNormalizer->transform([$text]);
     64▕         $text = $wordCountVectorizer->fit($text)->transform($text);
     65▕         $text = $tfIdfTransformer->fit($text)->transform($text);
     66▕ 
     67▕         // Make a prediction

  1   app/Console/Commands/PredictorCommand.php:45
      App\Console\Commands\PredictorCommand::predict()

      +13 vendor frames 
  15  artisan:37
      Illuminate\Foundation\Console\Kernel::handle()

Why is it being passed by reference? I thought the Array assignment always involved value copying.

If I run the code below:

$text = [$inputText];
$text = $textNormalizer->transform($text); 

The error I get then is:

Rubix\ML\Transformers\TextNormalizer::normalize(): Argument #1 ($sample) must be of type array, string given

If I do this instead:

 $text[] = [$inputText];
 $text = $textNormalizer->transform($text); 

The call works, but $text is null, so I cannot perform the following action, which is $wordCountVectorizer->fit($text)->transform($text);


Solution

  • Note: I have not used this package (Rubix ML) before, nor am I active with machine learning, but the issues seem purely php related. Therefore this answer can help get the code running, but that doesn't mean it will provide the desired result.

    I have looked at the source code of the TextNormalizer. The function you're calling requires an array, which calls another function, which should also be an array. The values are passed by reference, so you can't pass anything but a predefined variable in the method. This is why your last try ($text[] = [$inputText]) works because it creates the required structure. Also, the transform method returns void, so if you assign $text to the return type of transform, it will always be null due to the void return type.

    The next part is the WordCountVectorizer. The fit method expects a Dataset type. According to the docs, you can make datasets either labeled or unlabeled. For my example, I will use unlabeled datasets, but the desired dataset is up to you. The transform method expects an array like before.

    Then the last call, the tfIdfTransformer, works the same as the WordCountVectorizer. Therefore, we can do the following:

    $text = [[$inputText]];
    // $text is a reference, so the original value is updated
    $textNormalizer->transform($text);
    // Create dataset with transformed text
    $dataset = new \Rubix\ML\Datasets\Unlabeled($text);
    // Again, $text is a reference
    $wordCountVectorizer->fit($dataset);
    $wordCountVectorizer->transform($text);
    // Update dataset with transformed $text
    $dataset = new \Rubix\ML\Datasets\Unlabeled($text);
    // Again, $text is a reference
    $tfIdfTransformer->fit($dataset);
    $tfIdfTransformer->transform($text);
    

    Again, this should get your code up and running, but I cannot guarantee the result is as desired.