I have huge dynamically created tables/matrices in MATLAB of varying first dimension, whose rows represent (sorted) combinations of integers in the range 1-50 of order 6.
I would like to assign to each combination a unique value (hash, ranking), so that I can check if the same combinations appear in different tables. Different combinations are not allowed to have same value assigned, i.e. no collisions. I have to make a lot of such comparisons between a lot of such tables. So, for performance reasons, I would like to accomplish this by vectorization of uint32
operations to make it suitable for GPU acceleration in MATLAB.
Things I have thought of so far:
parfor
it through the rows, which is slower than other options. IIRC, the direct explicit formula, though vectorizable, requires computation of binomials, which in turn requires log Gamma
function in order to avoid huge factorials + double
type to avoid collision if I am not mistaken, i.e. is slower because it's 'very numerical'.uint32
and is definitely slower than other options.(x_1,...,x_6)
to x_1 + x_2 * 51 + ... + x_6 * 51^5
. This is the fastest I currently have. It's easily vectorizable, but unfortunately still requires uint64
or double
for rank-6 combinations of 50 elements, which is slower than uint32
or single
type operations would be.So, I guess, I am looking for a 'clever' injective function on these combinations that computes within the
uint32
range and is also well vectorizable (in MATLAB).
Any help would be much appreciated!
EDIT: Here is a routine that benchmarks both ranking and searching in uint32
, single
, and double
. I have used MATLAB's gputimeit
to produce accurate results.
% testing parameters:
N = 26;
ord = 5;
% base-(N+1):
basevec_uint32 = gpuArray(repmat(uint32(N+1),1,ord).^cast(0:ord-1,'uint32'));
basevec_single = cast(basevec_uint32,'single').';
basevec_double = cast(basevec_uint32,'double').';
% highest hash value:
max_hash_value = gpuArray(cast(N-ord+1:N,'single'))*basevec_single
% benchmark GPU-accelerated base-(N+1) ranking for uint32, single, and double:
X_uint16 = randi(N,15000000,ord,'uint16','gpuArray');
X_uint32 = cast(X_uint16,'uint32');
X_single = cast(X_uint16,'single');
X_double = cast(X_uint16,'double');
Y_uint16 = randi(N,5000000,ord,'uint16','gpuArray');
Y_uint32 = cast(Y_uint16,'uint32');
Y_single = cast(Y_uint16,'single');
Y_double = cast(Y_uint16,'double');
ranking_uint32 = @() sum(bsxfun(@times,X_uint32,basevec_uint32),2,'native');
ranking_single = @() X_single*basevec_single;
ranking_double = @() X_double*basevec_double;
disp('ranking in uint32:'); gputimeit(ranking_uint32,1)
disp('ranking in single:'); gputimeit(ranking_single,1)
disp('ranking in double:'); gputimeit(ranking_double,1)
% benchmark GPU-accelerated searching in uint32, single, and double matrices:
X_uint32_ranks = myfun_uint32(); Y_uint32_ranks = sum(bsxfun(@times,Y_uint32,basevec_uint32),2,'native');
X_single_ranks = myfun_single(); Y_single_ranks = Y_single*basevec_single;
X_double_hash = myfun_double(); Y_double_ranks = Y_double*basevec_double;
search_uint32 = @() ismember(X_uint32_ranks,Y_uint32_ranks);
search_single = @() ismember(X_single_ranks,Y_single_ranks);
search_double = @() ismember(X_double_ranks,Y_double_ranks);
disp('searching in uint32:'); gputimeit(search_uint32,1)
disp('searching in single:'); gputimeit(search_single,1)
disp('searching in double:'); gputimeit(search_double,1)
I am using nVidia GTX 1060, which gives me the following benchmark results under Windows (I don't know, however, how much WDDM influences gputimeit
):
>> test1
max_rank_value =
gpuArray single
14327680
ranking in uint32:
ans =
0.0129
ranking in single:
ans =
0.0063
ranking in double:
ans =
0.0108
searching in uint32:
ans =
0.1572
searching in single:
ans =
0.1577
searching in double:
ans =
0.1599
Conclusion: The choice of type does not impact the GPU-accelerated search via
ismember
(expected), however there is a noticeable difference in the ranking speed coming from the matrix multiplication. Ranking insingle
is almost twice as fast as ranking indouble
, while ranking inuint32
is nearly the same as ranking indouble
. MATLAB does not currently support GPU-accelerateduint32
matrix multiplication, so I am using a substitute function for this. Thus, it is much more reasonable to try to encode combinations of 50 elements of order 6 assingle
values, which, if I am not mistaken, has just enough bits to accomodate all 15890700 combinations assingle
integers.
You've almost got enough bits for your last idea, so you just need to squeeze a few bits out due to the ordering to get it over the bar. Since the whole sequence is sorted, every pair is also ordered. So use a 50-by-50 look-up table to map the sorted (1st,2nd), (3rd,4th), (5th,6th) pairs into numbers from 0-1274.
Or if you don't want a table, there are fairly simple explicit functions for mapping a pair (i,j) with j>=i to a linear index. Look up upper- or lower-triangular matrix indexing for details on those. (It'll be something along the lines of
n*(n+1)/2 - (n-i)*(n-i-1)/2 + j
with some +/-1's thrown in depending on base-0 or base-1 indexing, and n=50 in your case, but I'm sure I'll get it wrong writing it off-the-cuff.)
Anyway, once you've got three numbers 0-1274, the base-1275 idea will fit in uint32
.