Search code examples
matlabhashvectorizationgpgpuranking

Vectorized hashing/ranking of integer combinations of fixed size via operations on 32-bit integers in MATLAB


I have huge dynamically created tables/matrices in MATLAB of varying first dimension, whose rows represent (sorted) combinations of integers in the range 1-50 of order 6.

I would like to assign to each combination a unique value (hash, ranking), so that I can check if the same combinations appear in different tables. Different combinations are not allowed to have same value assigned, i.e. no collisions. I have to make a lot of such comparisons between a lot of such tables. So, for performance reasons, I would like to accomplish this by vectorization of uint32 operations to make it suitable for GPU acceleration in MATLAB.

Things I have thought of so far:

  1. Lexicographic ranking: no idea how to vectorize the standard fast recursive algorithms well, and the only option seems to be to parfor it through the rows, which is slower than other options. IIRC, the direct explicit formula, though vectorizable, requires computation of binomials, which in turn requires log Gamma function in order to avoid huge factorials + double type to avoid collision if I am not mistaken, i.e. is slower because it's 'very numerical'.
  2. Cantor pairing function: one can successively apply Cantor's pairing, which is nice because it's a polynomial expression, but it produces huge numbers well beyond uint32 and is definitely slower than other options.
  3. Base 51 (no pun intended) integers: sends a combination/row vector (x_1,...,x_6) to x_1 + x_2 * 51 + ... + x_6 * 51^5. This is the fastest I currently have. It's easily vectorizable, but unfortunately still requires uint64 or double for rank-6 combinations of 50 elements, which is slower than uint32 or single type operations would be.

So, I guess, I am looking for a 'clever' injective function on these combinations that computes within the uint32 range and is also well vectorizable (in MATLAB).

Any help would be much appreciated!

EDIT: Here is a routine that benchmarks both ranking and searching in uint32, single, and double. I have used MATLAB's gputimeit to produce accurate results.

% testing parameters:
N = 26;
ord = 5;

% base-(N+1):
basevec_uint32 = gpuArray(repmat(uint32(N+1),1,ord).^cast(0:ord-1,'uint32'));
basevec_single = cast(basevec_uint32,'single').';
basevec_double = cast(basevec_uint32,'double').';

% highest hash value:
max_hash_value = gpuArray(cast(N-ord+1:N,'single'))*basevec_single

% benchmark GPU-accelerated base-(N+1) ranking for uint32, single, and double:
X_uint16 = randi(N,15000000,ord,'uint16','gpuArray');
X_uint32 = cast(X_uint16,'uint32');
X_single = cast(X_uint16,'single');
X_double = cast(X_uint16,'double');

Y_uint16 = randi(N,5000000,ord,'uint16','gpuArray');
Y_uint32 = cast(Y_uint16,'uint32');
Y_single = cast(Y_uint16,'single');
Y_double = cast(Y_uint16,'double');

ranking_uint32 = @() sum(bsxfun(@times,X_uint32,basevec_uint32),2,'native');
ranking_single = @() X_single*basevec_single;
ranking_double = @() X_double*basevec_double;

disp('ranking in uint32:'); gputimeit(ranking_uint32,1)
disp('ranking in single:'); gputimeit(ranking_single,1)
disp('ranking in double:'); gputimeit(ranking_double,1)

% benchmark GPU-accelerated searching in uint32, single, and double matrices:
X_uint32_ranks = myfun_uint32(); Y_uint32_ranks = sum(bsxfun(@times,Y_uint32,basevec_uint32),2,'native');
X_single_ranks = myfun_single(); Y_single_ranks = Y_single*basevec_single;
X_double_hash = myfun_double(); Y_double_ranks = Y_double*basevec_double;

search_uint32 = @() ismember(X_uint32_ranks,Y_uint32_ranks);
search_single = @() ismember(X_single_ranks,Y_single_ranks);
search_double = @() ismember(X_double_ranks,Y_double_ranks);

disp('searching in uint32:'); gputimeit(search_uint32,1)
disp('searching in single:'); gputimeit(search_single,1)
disp('searching in double:'); gputimeit(search_double,1)

I am using nVidia GTX 1060, which gives me the following benchmark results under Windows (I don't know, however, how much WDDM influences gputimeit):

>> test1

max_rank_value =

gpuArray single

14327680

ranking in uint32:

ans =

0.0129

ranking in single:

ans =

0.0063

ranking in double:

ans =

0.0108

searching in uint32:

ans =

0.1572

searching in single:

ans =

0.1577

searching in double:

ans =

0.1599

Conclusion: The choice of type does not impact the GPU-accelerated search via ismember (expected), however there is a noticeable difference in the ranking speed coming from the matrix multiplication. Ranking in single is almost twice as fast as ranking in double, while ranking in uint32 is nearly the same as ranking in double. MATLAB does not currently support GPU-accelerated uint32 matrix multiplication, so I am using a substitute function for this. Thus, it is much more reasonable to try to encode combinations of 50 elements of order 6 as single values, which, if I am not mistaken, has just enough bits to accomodate all 15890700 combinations as single integers.


Solution

  • You've almost got enough bits for your last idea, so you just need to squeeze a few bits out due to the ordering to get it over the bar. Since the whole sequence is sorted, every pair is also ordered. So use a 50-by-50 look-up table to map the sorted (1st,2nd), (3rd,4th), (5th,6th) pairs into numbers from 0-1274.

    Or if you don't want a table, there are fairly simple explicit functions for mapping a pair (i,j) with j>=i to a linear index. Look up upper- or lower-triangular matrix indexing for details on those. (It'll be something along the lines of n*(n+1)/2 - (n-i)*(n-i-1)/2 + j with some +/-1's thrown in depending on base-0 or base-1 indexing, and n=50 in your case, but I'm sure I'll get it wrong writing it off-the-cuff.)

    Anyway, once you've got three numbers 0-1274, the base-1275 idea will fit in uint32.