I am trying to write a hardware accelerated hash table using AVX, where each bucket has a fixed size (AVX vector size). The question arose of how to implement a quick search by vector.
Incomplited possible solution:
example target hash: 2
<1 7 8 9 2 6 3 5> // vector of hashes
<2 2 2 2 2 2 2 2> // mask vector of target hash
------------------------ // equality comparison
<0 0 0 0 -1 0 0 0> // result of comparison
<0 1 2 3 4 5 6 7> // vector of indexes
------------------------ // and operation
<0 0 0 0 4 0 0 0> // index of target hash
How to extract index of target hash from last vector?
Another (slow) possible solution using a scalar product:
<1 7 8 9 2 6 3 5> // vector of hashes
<2 2 2 2 2 2 2 2> // mask vector of target hash
------------------------ // equality comparison
<0 0 0 0 -1 0 0 0> // result of comparison
<0 1 2 3 4 5 6 7> // vector of indexes
------------------------ // dot
-4
The appropriate horizontal operation for this is MOVMSKPS, which extracts a mask from an XMM/YMM vector (basically, it gathers the top bit from each lane). Once you've got that, you can do TZCNT or LZCNT to get to an index.
Example:
#include <intrin.h>
#include <immintrin.h>
int getIndexOf(int const values[8], int target)
{
__m256i valuesSimd = _mm256_loadu_si256((__m256i const*)values);
__m256i targetSplatted = _mm256_set1_epi32(target);
__m256i equalBits = _mm256_cmpeq_epi32(valuesSimd, targetSplatted);
unsigned equalMask = _mm256_movemask_ps(_mm256_castsi256_ps(equalBits));
int index = _tzcnt_u32(equalMask);
return index;
}