Search code examples
c++linear-regressiongradient-descentconvergence

Gradient descent converging towards the wrong value


I'm trying to implement a gradient descent algorithm in C++. Here's the code I have so far :

#include <iostream>

double X[] {163,169,158,158,161,172,156,161,154,145};
double Y[] {52, 68, 49, 73, 71, 99, 50, 82, 56, 46 };
double m, p;
int n = sizeof(X)/sizeof(X[0]);

int main(void) {
    double alpha = 0.00004; // 0.00007;
    m = (Y[1] - Y[0]) / (X[1] - X[0]);
    p = Y[0] - m * X[0];
    for (int i = 1; i <= 8; i++) {
        gradientStep(alpha);
    }
    return 0;
}

double Loss_function(void) {
    double res = 0;
    double tmp;
    for (int i = 0; i < n; i++) {
        tmp =  Y[i] - m * X[i] - p;
        res += tmp * tmp;
    }
    return res / 2.0 / (double)n;
}

void gradientStep(double alpha) {
    double pg = 0, mg = 0;
    for (int i = 0; i < n; i++) {
        pg += Y[i] - m * X[i] - p;
        mg += X[i] * (Y[i] - m * X[i] - p);
    }
    p += alpha * pg / n;
    m += alpha * mg / n;
}

This code converges towards m = 2.79822, p = -382.666, and an error of 102.88. But if I use my calculator to find out the correct linear regression model, I find that the correct values of m and p should respectively be 1.601 and -191.1.

I also noticed that the algorithm won't converge for alpha > 0.00007, which seems quite low, and the value of p barely changes during the 8 iterations (or even after 2000 iterations).

What's wrong with my code?

Here's a good overview of the algorithm I'm trying to implement. The values of theta0 and theta1 are called p and m in my program.

Other implementation in python

More about the algorithm


Solution

  • This link gives a comprehensive view of the algorithm; it turns out I was following a completely wrong approach.

    The following code does not work properly (and I have no plans to work on it further), but should put on track anyone who's confronted to the same problem as me :

    #include <vector>
    #include <iostream>
    
    typedef std::vector<double> vect;
    
    std::vector<double> y, omega(2, 0), omega2(2, 0);;
    std::vector<std::vector<double>> X;
    int n = 10;
    
    int main(void) {
        /* Initialize x so that each members contains (1, x_i) */
        /* Initialize x so that each members contains y_i */
        double alpha = 0.00001;
        display();
        for (int i = 1; i <= 8; i++) {
            gradientStep(alpha);
            display();
        }
        return 0;
    }
    
    double f_function(const std::vector<double> &x) {
        double c;
        for (unsigned int i = 0; i < omega.size(); i++) {
            c += omega[i] * x[i];
        }
        return c;
    }
    
    void gradientStep(double alpha) {
        for (int i = 0; i < n; i++) {
            for (unsigned int j = 0; j < X[0].size(); j++) {
                omega2[j] -= alpha/(double)n * (f_function(X[i]) - y[i]) * X[i][j];
            }
        }
        omega = omega2;
    }
    
    void display(void) {
        double res = 0, tmp = 0;
        for (int i = 0; i < n; i++) {
            tmp = y[i] - f_function(X[i]);
            res += tmp * tmp; // Loss functionn
        }
    
        std::cout << "omega = ";
        for (unsigned int i = 0; i < omega.size(); i++) {
            std::cout << "[" << omega[i] << "] ";
        }
        std::cout << "\tError : " << res * .5/(double)n << std::endl;
    }