Search code examples
c++sparse-matrixarmadillo

Loading large matrix with Armadillo


I have a very sparse matrix, with a density of about 0.01, and dimensions 20000 x 500000. I'm trying to load this in armadillo with

sp_mat V;
V.load(filename, coord_ascii);

The file format is

row column value

But this is taking way too long. Python can parse the file and fill a dictionary with it way faster than armadillo can create this matrix. How should I properly do this?

The matrix is going to be filled with integers.

Any advice would be appreciated!

Update:

This is an issue solely with Armadillo. C++ iterates the file without issue when read line by line, but assigning the values into an arma::sp_mat is extremely slow.


Solution

  • The armadillo documentation specifies

    "Using batch insertion constructors is generally much faster than consecutively inserting values using element access operators"

    So here is the best I could come up with

    sp_mat get(const char *filename) {         
        vector<long long unsigned int> location_u;
        vector<long long unsigned int> location_m;
        vector<double> values;                    
    
        ifstream file(filename);                  
        int a, b, c;                              
        while(file >> a >> b >> c) {                                   
            location_u.push_back(a);              
            location_m.push_back(b);              
            values.push_back(c);                  
        }                                         
    
        umat lu(location_u);                      
        umat lm(location_m);                      
        umat location(join_rows(lu, lm).t());     
    
        return V(location, vec(values));                                         
    }                                             
    

    It now runs at a reasonable speed, at about 1 million lines a second.