I've got the command below:
<?php
namespace App\Console\Commands;
use Illuminate\Console\Command;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Extractors\CSV;
use Rubix\ML\Transformers\TextNormalizer;
use Rubix\ML\Transformers\WordCountVectorizer;
use Rubix\ML\Transformers\TfIdfTransformer;
use Rubix\ML\Persisters\Filesystem;
use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\CrossValidation\Reports\MulticlassBreakdown;
use Rubix\ML\CrossValidation\Reports\ConfusionMatrix;
use Log;
class PredictorCommand extends Command
{
/**
* The name and signature of the console command.
*
* @var string
*/
protected $signature = 'command:predictor';
/**
* The console command description.
*
* @var string
*/
protected $description = 'Command description';
/**
* Execute the console command.
*
* @return int
*/
public function handle()
{
$mlModel = new PredictorCommand();
echo "Please enter your text: ";
$inputText = readline();
$prediction = $mlModel->predict($inputText);
echo "Prediction: {$prediction}";
}
public function predict($inputText)
{
// Load the trained model
$modelPath = '/var/www/gdpr/tests/model.rbx';
$modelData = file_get_contents($modelPath);
$estimator = unserialize($modelData);
// Preprocess the input text
$textNormalizer = new TextNormalizer();
$wordCountVectorizer = new WordCountVectorizer(10000);
$tfIdfTransformer = new TfIdfTransformer();
$text = $inputText;
$text = $textNormalizer->transform([$text]);
$text = $wordCountVectorizer->fit($text)->transform($text);
$text = $tfIdfTransformer->fit($text)->transform($text);
// Make a prediction
$prediction = $estimator->predictSample($text[0]);
return $prediction;
}
}
When running it I'm getting the following error:
Please enter your text: sd
Error
Rubix\ML\Transformers\TextNormalizer::transform(): Argument #1 ($samples) cannot be passed by reference
at app/Console/Commands/PredictorCommand.php:63
59▕ $wordCountVectorizer = new WordCountVectorizer(10000);
60▕ $tfIdfTransformer = new TfIdfTransformer();
61▕
62▕ $text = $inputText;
➜ 63▕ $text = $textNormalizer->transform([$text]);
64▕ $text = $wordCountVectorizer->fit($text)->transform($text);
65▕ $text = $tfIdfTransformer->fit($text)->transform($text);
66▕
67▕ // Make a prediction
1 app/Console/Commands/PredictorCommand.php:45
App\Console\Commands\PredictorCommand::predict()
+13 vendor frames
15 artisan:37
Illuminate\Foundation\Console\Kernel::handle()
Why is it being passed by reference? I thought the Array assignment always involved value copying.
If I run the code below:
$text = [$inputText];
$text = $textNormalizer->transform($text);
The error I get then is:
Rubix\ML\Transformers\TextNormalizer::normalize(): Argument #1 ($sample) must be of type array, string given
If I do this instead:
$text[] = [$inputText];
$text = $textNormalizer->transform($text);
The call works, but $text
is null, so I cannot perform the following action, which is $wordCountVectorizer->fit($text)->transform($text);
Note: I have not used this package (Rubix ML) before, nor am I active with machine learning, but the issues seem purely php related. Therefore this answer can help get the code running, but that doesn't mean it will provide the desired result.
I have looked at the source code of the TextNormalizer
. The function you're calling requires an array
, which calls another function, which should also be an array
. The values are passed by reference, so you can't pass anything but a predefined variable in the method. This is why your last try ($text[] = [$inputText]
) works because it creates the required structure. Also, the transform
method returns void
, so if you assign $text
to the return type of transform
, it will always be null
due to the void
return type.
The next part is the WordCountVectorizer
. The fit
method expects a Dataset
type. According to the docs, you can make datasets either labeled or unlabeled. For my example, I will use unlabeled datasets, but the desired dataset is up to you. The transform
method expects an array
like before.
Then the last call, the tfIdfTransformer
, works the same as the WordCountVectorizer
. Therefore, we can do the following:
$text = [[$inputText]];
// $text is a reference, so the original value is updated
$textNormalizer->transform($text);
// Create dataset with transformed text
$dataset = new \Rubix\ML\Datasets\Unlabeled($text);
// Again, $text is a reference
$wordCountVectorizer->fit($dataset);
$wordCountVectorizer->transform($text);
// Update dataset with transformed $text
$dataset = new \Rubix\ML\Datasets\Unlabeled($text);
// Again, $text is a reference
$tfIdfTransformer->fit($dataset);
$tfIdfTransformer->transform($text);
Again, this should get your code up and running, but I cannot guarantee the result is as desired.