Search code examples
phpmachine-learningregressionphp-ml

php-ml regression predicts weird values


My input values are 1, 2, 3, 4, ... and my output values are 1*1, 2*2, 3*3, 4*4, ... My code looks like this:

$reg = new LeastSquares();

$samples = array();
$targets = array();
for ($i = 1; $i < 100; $i++)
{  
  $samples[] = [$i];
  $targets[] = $i*$i;
}

$reg->train($samples, $targets);
  
echo $reg->predict([5])."\n";
echo $reg->predict([10])."\n";

I expect it to output roughly 25 and 100. But I get:

-1183.3333333333
-683.33333333333

I also tried to use SVR instead of LeastSquares but the values are strange too:

2498.23
2498.23

I am new to ML. What am I doing wrong?


Solution

  • As others have pointed out in the comments LeastSquares is for fitting a linear model to your data (training examples).

    Your data set (target = samples^2) is inherently non-linear. If you try to picture what happens when you fit the best possible (in a least square of residuals sense) line to a quadratic curve you get a negative y-intercept (a sketch of this below):

    enter image description here

    You've trained your linear model on data up to x=99, y=9801, which will mean you have a very large y-intercept. So down at x=5 or x=10 you end up with a large negative value as you've found.

    If you use support vector regression with a degree-2 polynomial it will do a good job of capturing the pattern of your data:

    <?php
    require_once __DIR__ . '/vendor/autoload.php';
    use Phpml\Regression\SVR;
    use Phpml\SupportVectorMachine\Kernel;
    
    $samples = array();
    $targets = array();
    for ($i = 1; $i <= 100; $i++)
    {  
      $samples[] = [$i];
      $targets[] = $i*$i;
    }
    
    $reg = new SVR(Kernel::POLYNOMIAL, $degree = 2);
    $reg->train($samples, $targets);
    
    echo $reg->predict([5])."\n";
    echo $reg->predict([10])."\n";
    ?>
    

    Returns:

    25.0995
    100.098
    

    From your response in the comments its clear that you're looking to apply a neural network so that you don't have to worry about what degree of model to fit to your data. A neural network with a single hidden layer can fit any continuous function arbitrarily well with enough hidden nodes, and enough training data.

    Unfortunately php-ml doesn't seem to have a MLP (multilayer perceptron - another term for a neural network) for regression available out-of-the-box. I'm sure you could build one from appropriate layers but if your goal is to get up and running with training regression models quickly it might not be the best approach.