Search code examples
c++memorysparse-matrixeigen3

Set sparsity pattern of Eigen::SparseMatrix without memory overhead


I need to set sparsity pattern of Eigen::SparseMatrix which I already know (I have unique sorted column indices and row offsets). Obviously it's possible via setFromTriplets but unfortunately setFromTriplets requires a lot of additional memory (at least in my case)

I wrote small example

const long nRows = 5000000;
const long nCols = 100000;
const long nCols2Skip = 1000;
//It's quite big!
const long nTriplets2Reserve = nRows * (nCols / nCols2Skip) * 1.1;
Eigen::SparseMatrix<double, Eigen::RowMajor, long> mat(nRows, nCols);

std::vector<Eigen::Triplet<double, long>> triplets;

triplets.reserve(nTriplets2Reserve);
for(long row = 0; row < nRows; ++row){
    for(long col = 0; col < nCols; col += nCols2Skip){
        triplets.push_back(Eigen::Triplet<double, long>(row, col, 1));
    }
}
std::cout << "filling mat" << std::endl << std::flush;
mat.setFromTriplets(triplets.begin(), triplets.end());

std::cout << "Finished! nnz " << mat.nonZeros() << std::endl;
//Stupid way to check memory consumption
std::cin.get();

In my case this example consumes something about 26Gb at peak (between lines "filling mat" and "Finished") and 18Gb after all. (I made all checks via htop). ~8Gb overhead is quite big for me (in my "real world" task i have bigger overhead).

So i have two questions:

  1. How to fill the sparsity pattern of Eigen::SparseMatrix with as little overhead as possible?
  2. Why does setFromTriplets require so much memory?

Please let me know if my example is wrong.

My Eigen version is 3.3.2

PS Sorry for my English

EDIT: It's looks like inserting (with preallocation) each triplet manually works faster and requires less memory at peak. But I still want to know is it possible to set sparsity pattern manually.


Solution

  • Ad 1: You can be even a bit more efficient than plain insert by using the internal functions startVec and insertBack, if you can guarantee that you insert elements in lexicographical order.

    Ad 2: If you use setFromTriplets you need approximately twice final the size of the matrix (plus the size of your Triplet container), since the elements are first inserted in a transposed version of the matrix, which is then transposed to the final matrix in order to make sure that all inner vectors are sorted. If you know the structure of your matrix ahead, this is obviously quite a waste of memory, but it is intended to work on arbitrary input data.

    In your example you have 5000000 * 100000 / 1000 = 5e8 elements. A Triplet requires 8+8+8 = 24 bytes (making about 12Gb for the vector) and each element of the sparse matrix requires 8+8=16 bytes (one double for the value, one long for the inner index), i.e., about 8Gb per matrix, so in total you require about 28Gb which is about 26 Gib.

    Bonus: If your matrix has some special structure, which can be stored more efficiently, and you are willing to dig deeper into the Eigen internals, you may also consider implementing a new type inheriting from Eigen::SparseBase<> (but I don't recomment this, unless memory/performance is very critical for you, and you are willing to go through a lot of "sparsely" documented internal Eigen code ...). However, in that case it is probably easier to think about what you intend to do with your matrix, and try to implement only special operations for that.