Search code examples
c++oopdata-structureshashmaps

What is the best way to create a sparse array in C++?


I am working on a project that requires the manipulation of enormous matrices, specifically pyramidal summation for a copula calculation.

In short, I need to keep track of a relatively small number of values (usually a value of 1, and in rare cases more than 1) in a sea of zeros in the matrix (multidimensional array).

A sparse array allows the user to store a small number of values, and assume all undefined records to be a preset value. Since it is not physically possible to store all values in memory, I need to store only a few non-zero elements. This could be several million entries.

Speed is a huge priority, and I would also like to dynamically choose the number of variables in the class at runtime.

I currently work on a system that uses a binary search tree (b-tree) to store entries. Does anyone know of a better system?


Solution

  • For C++, a map works well. Several million objects won't be a problem. 10 million items took about 4.4 seconds and about 57 meg on my computer.

    My test application is as follows:

    #include <stdio.h>
    #include <stdlib.h>
    #include <map>
    
    class triple {
    public:
        int x;
        int y;
        int z;
        bool operator<(const triple &other) const {
            if (x < other.x) return true;
            if (other.x < x) return false;
            if (y < other.y) return true;
            if (other.y < y) return false;
            return z < other.z;
        }
    };
    
    int main(int, char**)
    {
        std::map<triple,int> data;
        triple point;
        int i;
    
        for (i = 0; i < 10000000; ++i) {
            point.x = rand();
            point.y = rand();
            point.z = rand();
            //printf("%d %d %d %d\n", i, point.x, point.y, point.z);
            data[point] = i;
        }
        return 0;
    }
    

    Now to dynamically choose the number of variables, the easiest solution is to represent index as a string, and then use string as a key for the map. For instance, an item located at [23][55] can be represented via "23,55" string. We can also extend this solution for higher dimensions; such as for three dimensions an arbitrary index will look like "34,45,56". A simple implementation of this technique is as follows:

    std::map data<string,int> data;
    char ix[100];
    
    sprintf(ix, "%d,%d", x, y); // 2 vars
    data[ix] = i;
    
    sprintf(ix, "%d,%d,%d", x, y, z); // 3 vars
    data[ix] = i;