SIMDKR string matching algorithm used _mm256_movemask_epi8
to convert a Vector256 to an int
by extracting the high bit of each byte.
I want to implement this clang algorithm in C#, by using Vector512 instead of 256, but I can't find a method to do it.
There is a Avx2.MoveMask()
,and no Avx512F/BW/VBMI/DQ.MoveMask
.
const __m256i first = _mm256_set1_epi8(needle[0]);
const __m256i last = _mm256_set1_epi8(needle[m - 1]);
const __m256i block_first1 = _mm256_loadu_si256((const __m256i *)(s + i));
const __m256i block_last1 = _mm256_loadu_si256((const __m256i *)(s + i + m - 1));
const __m256i eq_first1 = _mm256_cmpeq_epi8(first, block_first1);
const __m256i eq_last1 = _mm256_cmpeq_epi8(last, block_last1);
const uint32_t mask1 = _mm256_movemask_epi8(_mm256_and_si256(eq_first1, eq_last1));
I use bits operation to replace _mm512_movepi8_mask
with this:
ulong mask = ((ulong)Avx2.MoveMask(buffer.GetUpper()) << 32) | (uint)Avx2.MoveMask(buffer.GetLower());
Is this right? Is this have the best performance?
AVX512 is (also outside of C#) a bit different when it comes to extracting a mask of the upper bits than AVX2, VPMOVMSKB
has no direct 512-bit equivalent. In raw AVX512 you can convert a vector to a mask (the AVX512 concept of a mask) with the VPMOVB2M/VPMOVW2M/VPMOVD2M/VPMOVQ2M family of instructions, and then you can move the mask from a mask register to a general-purpose register with the kmov
-family of instructions.
C# treats masks a bit differently than raw AVX512 does (masks are mostly represented via the Vector512<T>
type as well, you're not normally working with the mask-as-an-integer, I'm not entirely sure yet what the implications of that are for mask-manipulation code), but you can do both of those steps (converting a vector to a mask and moving it from a mask register to a general purpose register) combined with Vector512.ExtractMostSignificantBits.
I tried that under .NET 8 and I got assembly code like this:
vpmovb2m k1,zmm0
kmovq rax,k1
Looks good to me.
Going more into the actual context of a string comparison, in C# you get some comparisons:
Vector512.Equals
which returns a mask as an Vector512<T>
Avx512BW.CompareEqual
(this is for bytes and words, comparisons for other types are in other classes) which also returns a mask as an Vector512<T>
Vector512.EqualsAny
, Vector512.EqualsAll
, which don't return a mask at all, only a boolean (for both of them I got a comparison and kortestq
, if the inputs are Vector512<byte>
, followed by some branch or setcc depending on how the boolean is used)If you want the result of a comparison as a mask in an integer, you can combine eg Vector512.Equals
with Vector512.ExtractMostSignificantBits
. That doesn't result in pointlessly converting a mask to a vector then back to a mask, you get the right thing, I tried it and got this:
vpcmpeqb k1,zmm0,zmmword ptr [rax+50h]
kmovq rax,k1