Search code examples
javalinear-algebramatrix-factorization

Cholesky decomposition generating NaNs in Java


I'm not sure whether this is a maths.se or a SO question, but I'm going with SO as I think it's related to my Java.

I'm following a text book on Gaussian Processes (R&W) and implementing some examples in Java. One common step for several examples is to generate a Cholesky decomposition of a covariance matrix. In my attempt I can get successful results for matrices up to a limited size (33x33). However, for anything larger a NaN appears in the diagonal (at 32,32) and so all subsequent values in the matrix are likewise NaNs.

The code is shown below, and the source of the NaN is indicated in the cholesky method. Essentially the covariance element a[32][32] is 1.0, but the value of sum is a little over this (1.0000001423291431), so the square root is imaginary. So my questions are:

  1. Is this an expected result from linear algebra, or, e.g., an artefact of my implementation?
  2. How is this problem best avoided in practice?

Note that I'm not looking for recommendations of libraries to use. This is simply for my own understanding.

Apologies for the length, but I've tried to provide a complete MWE:

import static org.junit.Assert.assertFalse;

import org.junit.Test;

public class CholeskyTest {

    @Test
    public void testCovCholesky() {
        final int n = 34; // Test passes for n<34
        final double[] xData = getSpread(-5, 5, n);
        double[][] cov = covarianceSE(xData);
        double[][] lower = cholesky(cov);
        for(int i=0; i<n; ++i) {
            for(int j=0; j<n; ++j) {
                assertFalse("NaN at " + i + "," + j, Double.isNaN(lower[i][j]));
            }
        }
    }

    /**
     * Generate n evenly space values from min to max inclusive
     */
    private static double[] getSpread(final double min, final double max, final int n) {
        final double[] values = new double[n];
        final double delta = (max - min)/(n - 1);
        for(int i=0; i<n; ++i) {
            values[i] = min + i*delta;
        }
        return values;
    }

    /**
     * Calculate the covariance matrix for the given observations using
     * the squared exponential (SE) covariance function.
     */
    private static double[][] covarianceSE (double[] v) {
        final int m = v.length;
        double[][] k = new double[m][];
        for(int i=0; i<m; ++i) {
            double vi = v[i];
            double row[] = new double[m];
            for(int j=0; j<m; ++j) {
                double dist = vi - v[j];
                row[j] = Math.exp(-0.5*dist*dist);
            }
            k[i] = row;
        }
        return k;
    }

    /**
     * Calculate lower triangular matrix L such that LL^T = A
     * Using Cholesky decomposition from
     * https://rosettacode.org/wiki/Cholesky_decomposition#Java
     */
    private static double[][] cholesky(double[][] a) {
        final int m = a.length;
        double[][] l = new double[m][m];
        for(int i = 0; i< m;i++){
            for(int k = 0; k < (i+1); k++){
                double sum = 0;
                for(int j = 0; j < k; j++){
                    sum += l[i][j] * l[k][j];
                }
                l[i][k] = (i == k) ? Math.sqrt(a[i][i] - sum) : // Source of NaN at 32,32
                    (1.0 / l[k][k] * (a[i][k] - sum));
            }
        }
        return l;
    }
}

Solution

  • Hmm, I think I've found an answer to my own question, from the same textbook I was following. From R&W p.201:

    In practice it may be necessary to add a small multiple of the identity matrix $\epsilon I$ to the covariance matrix for numerical reasons. This is because the eigenvalues of the matrix K can decay very rapidly [...] and without this stabilization the Cholesky decomposition fails. The effect on the generated samples is to add additional independent noise of variance $epsilon$.

    So the following change seems to be sufficient:

    private static double[][] cholesky(double[][] a) {
        final int m = a.length;
        double epsilon = 0.000001; // Small extra noise value
        double[][] l = new double[m][m];
        for(int i = 0; i< m;i++){
            for(int k = 0; k < (i+1); k++){
                double sum = 0;
                for(int j = 0; j < k; j++){
                    sum += l[i][j] * l[k][j];
                }
                l[i][k] = (i == k) ? Math.sqrt(a[i][i]+epsilon - sum) : // Add noise to diagonal values
                    (1.0 / l[k][k] * (a[i][k] - sum));
            }
        }
        return l;
    }