Search code examples
c++serializationboostsparse-matrix

Boost serialize sparse matrix from Armadillo


I am trying to use the sparse matrix feature in Armadillo and am having some troubles serializing it. The matrices that I am dealing with are very large and mostly zeroes in the components so it makes sense to use sp_mat. Here is the code:

#include <iostream>
#include <fstream>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <armadillo>
#include <boost/serialization/split_member.hpp>

BOOST_SERIALIZATION_SPLIT_FREE(arma::sp_mat)

namespace boost { 
namespace serialization {

template<class Archive>
void save(Archive & ar, const arma::sp_mat &t, unsigned int version)
{
    ar & t.n_rows;
    ar & t.n_cols;
    const double *data = t.memptr();
    for(int K=0; K<t.n_elem; ++K)
        ar & data[K];

}

template<class Archive>
void load(Archive & ar, arma::sp_mat &t, unsigned int version)
{
    int rows, cols;
    ar & rows;
    ar & cols;
    t.set_size(rows, cols);
    double *data = t.memptr();
    for(int K=0; K<t.n_elem; ++K)
        ar & data[K];}
}}
int main() {

  arma::mat C(3,3, arma::fill::randu);
  C(1,1) = 0; //example so that a few of the components are u
  C(1,2) = 0;
  C(0,0) = 0;
  C(2,1) = 0;
  C(2,0) = 0;
  arma::sp_mat A = arma::sp_mat(C);

  std::ofstream outputStream;
  outputStream.open("bin.dat");
  std::ostringstream oss;
  boost::archive::binary_oarchive oa(outputStream);
  oa & A;
  outputStream.close();

  arma::sp_mat B;
  std::ifstream inputStream;
  inputStream.open("bin.dat", std::ifstream::in);
  boost::archive::binary_iarchive ia(inputStream);
  ia & B;
  return 0;
}

The current problem is that sp_mat doesn't have a memptr() member so serializing the components that are done e.g. at lines 10-12 doesn't work for sp_mat. I am curious if anyone knows a workaround? I find it odd that when I print all of the components of A individually, even the zeroes are still in memory even though the sparse matrix ignores the zeroes. E.g. I printed A(1,1) and I got 0. Here is also what the A looks like when printed:

[matrix size: 3x3; n_nonzero: 4; density: 44.44%]

     (1, 0)         0.2505
     (0, 1)         0.9467
     (0, 2)         0.2513
     (2, 2)         0.5206

Solution

  • The number of elements in a matrix is always n × m, rergardless of the storage strategy (sparse or dense).

    Therefore, you should not be surprised to be able to read the "0" cells - they may not be stored, but it is obvious that they matter for computation, so you should be able to retrieve their value.

    In the light of these, your sketch (with memptr() which I presume was copy/pasted from some code specific to non-sparse matrices) is always going to store non-sparse data (you iterate all n_elems). But data cannot point to some contiguous storage, because how would the matrix know whuch cells are which unless the memory layout matched the matrix dimensions directly (dense storage, row-major or column-major).

    Based on the information from Returning locations and values of a sparse matrix in armadillo c++ here's a fixed implementation:

    • does NOT try to use undocumented implementation details
    • uses the documented interface (it.col(), it.row()) to serialize sparsely
    • works

    Full code (tested on my machine):

    #include <armadillo>
    #include <boost/archive/binary_iarchive.hpp>
    #include <boost/archive/binary_oarchive.hpp>
    #include <boost/serialization/split_member.hpp>
    #include <fstream>
    #include <iostream>
    
    BOOST_SERIALIZATION_SPLIT_FREE(arma::sp_mat)
    
    namespace boost { namespace serialization {
    
        template<class Archive>
        void save(Archive & ar, const arma::sp_mat &t, unsigned) {
            ar & t.n_rows & t.n_cols & t.n_nonzero;
    
            for (auto it = t.begin(); it != t.end(); ++it) {
                ar & it.row() & it.col() & *it;
            }
        }
    
        template<class Archive>
        void load(Archive & ar, arma::sp_mat &t, unsigned) {
            uint64_t r, c, nz;
            ar & r & c & nz;
    
            t.zeros(r, c);
            while (nz--) {
                double v;
                ar & r & c & v;
                t(r, c) = v;
            }
        }
    }} // namespace boost::serialization
    
    int main() {
    
        arma::mat C(3, 3, arma::fill::randu);
        C(0, 0) = 0;
        C(1, 1) = 0; // example so that a few of the components are u
        C(1, 2) = 0;
        C(2, 0) = 0;
        C(2, 1) = 0;
    
        {
            arma::sp_mat const A = arma::sp_mat(C);
            assert(A.n_nonzero == 4);
    
            A.print("A: ");
            std::ofstream outputStream("bin.dat", std::ios::binary);
            boost::archive::binary_oarchive oa(outputStream);
            oa& A;
        }
    
        {
            std::ifstream inputStream("bin.dat", std::ios::binary);
            boost::archive::binary_iarchive ia(inputStream);
    
            arma::sp_mat B(3,3);
            B(0,0) = 77; // some old data should be cleared
    
            ia& B;
    
            B.print("B: ");
        }
    }
    

    Prints

    A:
    [matrix size: 3x3; n_nonzero: 4; density: 44.44%]
    
         (1, 0)         0.2505
         (0, 1)         0.9467
         (0, 2)         0.2513
         (2, 2)         0.5206
    
    B:
    [matrix size: 3x3; n_nonzero: 4; density: 44.44%]
    
         (1, 0)         0.2505
         (0, 1)         0.9467
         (0, 2)         0.2513
         (2, 2)         0.5206