Comparison of SuperLu and LaPack fails when benchmarking with armadillo

I wanted to compare the speed of the sparse solver of SuperLu and the dense version using LaPack when using spsolve() in armadillo. Thus I wrote this program:

#include "stdafx.h"
#include <iostream>
#include <Windows.h>
#include <armadillo\armadillo>

#define SIZE 2500
#define ROUNDS 2500

int main()
{
    //Time measurement stuff
    LARGE_INTEGER frequency;
    LARGE_INTEGER t1, t2, t3, t4;
    QueryPerformanceFrequency(&frequency);
    //Other stuff
    arma::cx_colvec b = arma::randu<arma::cx_colvec>(SIZE);
    arma::cx_colvec b1 = b, b2 = b;
    arma::sp_cx_mat A = arma::sp_cx_mat(SIZE, SIZE);
    A.diag(-2).fill(-1);
    A.diag(-1).fill(16);
    A.diag(0).fill(-30);
    A.diag(1).fill(16);
    A.diag(2).fill(-1);


    arma::cx_colvec c = arma::zeros<arma::cx_colvec>(SIZE), d = arma::zeros<arma::cx_colvec>(SIZE); 
    QueryPerformanceCounter(&t1);
    for (size_t i = 0; i < ROUNDS; i++)
    {
        if(arma::spsolve(c, A, b1, "superlu") == false)
        {
            std::cout << "Error in round 1\n";
            break;
        }
        b1 = c;
    }
    QueryPerformanceCounter(&t2);
    QueryPerformanceCounter(&t3);
    for (size_t i = 0; i < ROUNDS; i++)
    {
        if(arma::spsolve(d, A, b2, "lapack") == false)
        {
            std::cout << "Error in round 2\n";
            break;
        }
        b2 = d;
    }
    QueryPerformanceCounter(&t4);
    std::cout << "Superlu took " << (t2.QuadPart - t1.QuadPart)*1000.0 / frequency.QuadPart << '\n';
    std::cout << "Lapack took " << (t4.QuadPart - t3.QuadPart)*1000.0 / frequency.QuadPart << '\n';
    std::cout << "Both results are equal: " << arma::approx_equal(b1, b2, "abstol", 1e-5) << '\n';
    return 0;
}

Now for small values of SIZE and ROUND the function approx_equal returns true, but for larger values the results b1 and b2 are not equal anymore according to approx_equal. Why? Could the problem be that my superlu-library is not working properly?

Solution

I wouldn't blame the SuperLU library. The "issue" here seems to be that the smallest eigenvalue of the matrix A gets smaller and smaller for progressively larger and larger values of SIZE. Now, the for loops repeatedly apply inv(A) to a given vector. Since the vector you start with is random, it will have some nonzero "admixture" of the eigenvector of A corresponding to the smallest eigenvalue. If the inversion is repeated many times, this component becomes significantly magnified and thus the individual components of the vector(s) b1/b2 become large.

For example, for SIZE=2000 and ROUNDS=2, I get that the maximum component (in absolute value) of the solution is around 10^9. Your test then seems to prescribe an absolute tolerance of 10^-5. However, for such large numbers, this would mean that 14 significant digits have to match exactly which is almost at the limits of the double precision. In my opinion, given the nature of the numbers one is comparing here, it would make more sense to test, e.g., the relative error with approx_equal(b1, b2, "reldiff", 1E-8).

Also, one should check whether the solution in fact makes sense - for large number of ROUNDS, it will sooner or later overflow. For example already with SIZE=2000 and ROUNDS=80, I get infinities in the b1/b2 vectors...