Search code examples
matlabdata-structuresbinary-treeimplementationsparse-matrix

A suitable data structure implementation for incrementing values in sparse indices in Matlab


Currently, I am using cell C which is the size of 1 x H, where H is a very huge number. This cell is mainly used to quickly store and increment some values in very sparse indices in O(1) time. For instance,

H = 10000000;
C = cell(1, H);
...
for i = 1:last

    % all values of someIndex(i) are values from 1 to H, unsorted, and contains repeating values.

    C{ someIndex(i) } = C{ someIndex(i) } + someValues(i);

end
...

Only a tiny indices in cell C will be used - I see, most of the time, about 1-10% of the whole thing. At first, the implementation is fine for smaller size database, but I am going for bigger database, which H will grow almost exponentially. Therefore, this naive implementation will no longer work.

I was also thinking of something like, instead of using cell, use arrays to store each index and value in this fashion:

Suppose we detect a new index:

IndexArray(size(IndexArray, 2) + 1) = someIndex(i);
ValueArray(size(ValueArray, 2) + 1) = someValue(i);

Suppose we detect an old index: (sure, we will need to sweep through the whole IndexArray and see whether such old index exist or not)

ValueArray( detectOldIndex(i) ) = ValueArray( detectOldIndex(i) ) + someValue(i);

However, there is also a problem with this. As more and more indices grow, sweeping through the whole IndexArray will cost more and more time. This is O(N).

Of course, for something like this, we definitely want to go with Trees, however, in Matlab, we don't have a valid Tree structure. I could think of using nested Cells in nested Cells. But the implementation could be quite ugly.

So if I am going to do something like this in Matlab, what are my faster choices?


Solution

  • Have you considered using Map container?

    The Map container is basically a hash table. Therefore, it should be rather efficient in mapping a small set of "active" indices over a HUGE domain.