c libraries linear-algebra sparse-matrix suitesparse

Multiply by supernodal L in CHOLMOD?

How can I multiply by the cholmod_factor L in a supernodal L L^T factorisation? I'd prefer not to convert to simplicial since the supernodal representation results in faster backsolves, and I'd prefer not to make a copy of the factor since two copies might not fit in RAM.

Solution

I wound up understanding the supernodal representation from a nice comment in the supernodal-to-simplicial helper function in t_cholmod_change_factor.c. I paraphrase the comment and add some details below:

A supernodal Cholesky factorisation is represented as a collection of supernodal blocks. The entries of a supernodal block are arranged in column-major order like this 6x4 supernode:

t - - -    (row s[pi[snode+0]])
t t - -    (row s[pi[snode+1]])
t t t -    (row s[pi[snode+2]])
t t t t    (row s[pi[snode+3]])
r r r r    (row s[pi[snode+4]])
r r r r    (row s[pi[snode+5]])

There are unused entries (indicated by the hyphens) in order to make the matrix rectangular.
The column indices are consecutive.
The first ncols row indices are those same consecutive column indices. Later row indices can refer to any row below the t triangle.
The super member has one entry for each supernode; it refers to the first column represented by the supernode.
The pi member has one entry for each supernode; it refers to the first index in the s member where you can look up the row numbers.
The px member has one entry for each supernode; it refers to the first index in the x member where the entries are stored. Again, this is not packed storage.

The following code for multiplication by a cholmod_factor *L appears to work (I only care about int indices and double-precision real entries):

cholmod_dense *mul_L(cholmod_factor *L, cholmod_dense *d) {
  int rows = d->nrow, cols = d->ncol;
  cholmod_dense *ans = cholmod_allocate_dense(rows, cols, rows,
      CHOLMOD_REAL, &comm);
  memset(ans->x, 0, 8 * rows * cols);

  FOR(i, L->nsuper) {
    int *sup = (int *)L->super;
    int *pi = (int *)L->pi;
    int *px = (int *)L->px;
    double *x = (double *)L->x;
    int *ss = (int *)L->s;

    int r0 =  pi[i], r1 =  pi[i+1], nrow = r1 - r0;
    int c0 = sup[i], c1 = sup[i+1], ncol = c1 - c0;
    int px0 = px[i];

    /* TODO: Use BLAS instead. */
    for (int j = 0; j < ncol; j++) {
      for (int k = j; k < nrow; k++) {
        for (int l = 0; l < cols; l++) {
          ((double *)ans->x)[l * rows + ss[r0 + k]] +=
              x[px0 + k + j * nrow] * ((double *)d->x)[l*rows+c0 + j];
        }
      }
    }
  }
  return ans;
}