Search code examples
phpmachine-learninglinear-regressionphp-ml

Recommendation Engine with PHP-ML and regression


I try to find out how to work with PHP-ML when i want to recommend some items to current customer.

My dataset (numeration is only the number of the row):

  1. Product 1 was purchased together with Product 2
  2. Product 1 was purchased together with Product 2
  3. Product 1 was purchased together with Product 3
  4. Product 1 was purchased together with Product 2
  5. Product 2 was purchased together with Product 4
  6. Product Y.. was purchased together with Product X..

As a customer i had bought in the past Product 1. So normally i would expect in my recommendation box product 2 because 3 people bought it together with product 1.

I think i need here some regression algorythm which give me some correlation value between product X and product Y.

I thought about the linear SVR algorythm but i have no idea how to train it?

// Step 1: Load the Dataset
// Step 2: Prepare the Dataset
// Step 3: Generate the training/testing Dataset
$samples = [[1,2], [1,2], [1,3], [1,2], [2,4], [X,Y..]];
$targets = [?, ?, ? , ? , ? , ?];

$regression = new LeastSquares();
// Step 4: Train the classifier
$regression->train($samples, $targets);


echo $regression->predict([1,2]);

In my mind i should get some value like 0.25 -> 25% percent of customers who bought product 1 also bought product 2. Then i could order my predictions and have the order in my recommendation box. My main question is, what should i use for train? Do I understand something completely wrong?

Thank you


Solution

  • First of all you don't need linear regression here and if you needed that you would have to convert the categorical data in order to do a numeric prediction. Typically you would use dummy variables, that means that your table would convert from:

    | Product A | Product B |
    |-----------|-----------|
    |         1 |         2 |
    |         1 |         2 |
    |         1 |         3 |
    |         1 |         2 |
    |         2 |         4 |
    

    to something like :

    | Product 1  | Product 2 | Product 3 | Product 4 |
    |------------|-----------|-----------|-----------|
    |          1 |         1 |         0 |         0 |
    |          1 |         1 |         0 |         0 |
    |          1 |         0 |         1 |         0 |
    |          1 |         1 |         0 |         0 |
    |          0 |         1 |         0 |         1 |
    

    See https://datascience.stackexchange.com/questions/28306/transform-categorical-variables-into-numerical for more info. Sadly I think PHP-ML does not have support for categorical data encoding at this moment. If you don't convert the categorical data you would get maybe 1.6 as a prediction, at that wouldn't mean anything useful.

    But there is an easier way to do this in PHP-ML. You can use an Apriori associator. That can learn which associations are more frequent and predict them. In the following you can see that in action.

    use Phpml\Association\Apriori;
    
    $samples = [[1,2], [1,2], [1,3], [1,2], [2,4]];
    $labels  = [];
    
    
    $associator = new Apriori($support = 0.5, $confidence = 0.5);
    $associator->train($samples, $labels);
    
    var_export($associator->predict([1]));
    // outputs  [[ 2 ]];  The right prediction!
    

    In adition when working in machine learning is useful to split your data into what is called the training and the test set. That way you can directly test your ML model. It is also implemented in PHP-ML