Search code examples
c++algorithmsparse-array

Sparse O(1) array with indices being consecutive products


I'd like to pre-calculate an array of values of some unary function f.

I know that I'll only need the values for f(x) where x is of the form of a*b, where both a and b are integers in range 0..N.

The obvious time-optimized choice is just to make an array of size N*N and just pre-calculate just the elements which I'm going to read later. For f(a*b), I'd just check and set tab[a*b]. This is the fastest method possible - however, this is going to take a lot of space as there are lots of indices in this array (starting with N+1) which will never by touched.

Another solution is to make a simple tree map... but this slows down the lookup itself very heavily by introducing lots of branches. No.

I wonder - is there any solution to make such an array less sparse and smaller, but still quick branchless O(1) in lookup?

edit

I can hear lots of comments about a hash map... I'll proceed to benchmark how one behaves (I expect a significant performance drop over normal lookup due to branching; less than in trees, but still... let's see if I'm right!).

I'd like to emphasize: I'd mostly appreciate an analytical solution which would use some clever way (?) to take advantage of the fact that only "product-like" indices are taken. I feel that this fact might be exploited to get a way better result that an average generic hash map function, but I'm out of ideas myself.

edit

Following your advice, I've tried std::unordered_map from gcc 4.5. This was a tad slower than the simple array lookup, but indeed much faster than the tree-based std::map - ultimately I'm OK with this solution. I understand now why it's not possible to do what I originally intended to; thanks for the explanations!

I'm just unsure whether the hash-map actually saves any memory! :) As @Keith Randall has described, I cannot get the memory footprint lower than N*N/4, and the triangular matrix approach described by @Sjoerd gives me N*N/2. I think that it's entirely possible for the hash map to use more than N*N/2 space if the element size is small (depends on the container overhead) - which would make the fastest approach also the most memory-effective! I'll try to check that.

I wish I could accept 2 answers...


Solution

  • There doesn't seem to be a lot of structure to take advantage of here. If you're asking if there is a way to arrange to arrange the table such that you can avoid storage for entries that can't happen (because they have a prime factor larger than N), you can't save much. There is a theory of smooth numbers which states that the density of N-smooth numbers near N^2 is ~2^-2. So, absolute best case, you can reduce the (maximum) storage requirement by at most a factor of 4.

    I think you're better off taking advantage of symmetry and then using a hash table if you expect most arguments to never occur.