Search code examples
cmachine-learninggradient-descent

Gradient descent returning nan


I need to write a function to get a curve fit of a dataset. The code below is what I have. It attempts to use gradient descent to find polynomial coefficients which best fit the data.

//solves for y using the form y = a + bx + cx^2 ...
double calc_polynomial(int degree, double x, double* coeffs) {
    double y = 0;

    for (int i = 0; i <= degree; i++)
        y += coeffs[i] * pow(x, i);

    return y;
}

//find polynomial fit
//returns an array of coefficients degree + 1 long
double* poly_fit(double* x, double* y, int count, int degree, double learningRate, int iterations) {
    double* coeffs = malloc(sizeof(double) * (degree + 1));
    double* sums = malloc(sizeof(double) * (degree + 1));

    for (int i = 0; i <= degree; i++)
        coeffs[i] = 0;

    for (int i = 0; i < iterations; i++) {
        //reset sums each iteration
        for (int j = 0; j <= degree; j++)
            sums[j] = 0;

        //update weights
        for (int j = 0; j < count; j++) {
            double error = calc_polynomial(degree, x[j], coeffs) - y[j];

            //update sums
            for (int k = 0; k <= degree; k++)
                sums[k] += error * pow(x[j], k);
        }

        //subtract sums
        for (int j = 0; j <= degree; j++)
            coeffs[j] -= sums[j] * learningRate;
    }

    free(sums);

    return coeffs;
}

And my testing code:

double x[] = { 0, 1, 2, 3, 4 };
double y[] = { 5, 3, 2, 3, 5 };
int size = sizeof(x) / sizeof(*x);

int degree = 1;
double* coeffs = poly_fit(x, y, size, degree, 0.01, 1000);

for (int i = 0; i <= degree; i++)
    printf("%lf\n", coeffs[i]);

The code above works when degree = 1, but anything higher causes the coefficients to come back as nan.

I've also tried replacing

coeffs[j] -= sums[j] * learningRate;

with

coeffs[j] -= (1/count) * sums[j] * learningRate;

but then I get back 0s instead of nan.

Anyone know what I'm doing wrong?


Solution

  • I tried degree = 2, iteration = 10 and got results other than nan (values around a few thousands) Adding one to iteration seems making magnitude of the results larger by about 3 times after that.

    From this observation, I guessed that the results are being multiplied by count.

    In the expression

    coeffs[j] -= (1/count) * sums[j] * learningRate;
    

    Both of 1 and count are integers, so integer division is done in 1/count and it will become zero if count is larger than 1.

    Instead of that, you can divide the result of multiplication by count.

    coeffs[j] -= sums[j] * learningRate / count;
    

    Another way is using 1.0 (double value) instead of 1.

    coeffs[j] -= (1.0/count) * sums[j] * learningRate;