Search code examples
c#simdavx512

What is the alternative method for Avx2.MoveMask in Vector512<T>


SIMDKR string matching algorithm used _mm256_movemask_epi8 to convert a Vector256 to an int by extracting the high bit of each byte.

I want to implement this clang algorithm in C#, by using Vector512 instead of 256, but I can't find a method to do it. There is a Avx2.MoveMask() ,and no Avx512F/BW/VBMI/DQ.MoveMask.

  const __m256i first = _mm256_set1_epi8(needle[0]);
  const __m256i last = _mm256_set1_epi8(needle[m - 1]);

  const __m256i block_first1 = _mm256_loadu_si256((const __m256i *)(s + i));
  const __m256i block_last1 = _mm256_loadu_si256((const __m256i *)(s + i + m - 1));

  const __m256i eq_first1 = _mm256_cmpeq_epi8(first, block_first1);
  const __m256i eq_last1 = _mm256_cmpeq_epi8(last, block_last1);

  const uint32_t mask1 = _mm256_movemask_epi8(_mm256_and_si256(eq_first1, eq_last1));

I use bits operation to replace _mm512_movepi8_mask with this:

ulong mask = ((ulong)Avx2.MoveMask(buffer.GetUpper()) << 32) | (uint)Avx2.MoveMask(buffer.GetLower());

Is this right? Is this have the best performance?


Solution

  • AVX512 is (also outside of C#) a bit different when it comes to extracting a mask of the upper bits than AVX2, VPMOVMSKB has no direct 512-bit equivalent. In raw AVX512 you can convert a vector to a mask (the AVX512 concept of a mask) with the VPMOVB2M/VPMOVW2M/VPMOVD2M/VPMOVQ2M family of instructions, and then you can move the mask from a mask register to a general-purpose register with the kmov-family of instructions.

    C# treats masks a bit differently than raw AVX512 does (masks are mostly represented via the Vector512<T> type as well, you're not normally working with the mask-as-an-integer, I'm not entirely sure yet what the implications of that are for mask-manipulation code), but you can do both of those steps (converting a vector to a mask and moving it from a mask register to a general purpose register) combined with Vector512.ExtractMostSignificantBits.

    I tried that under .NET 8 and I got assembly code like this:

    vpmovb2m    k1,zmm0 
    kmovq       rax,k1
    

    Looks good to me.

    Going more into the actual context of a string comparison, in C# you get some comparisons:

    • Vector512.Equals which returns a mask as an Vector512<T>
    • Avx512BW.CompareEqual (this is for bytes and words, comparisons for other types are in other classes) which also returns a mask as an Vector512<T>
    • Vector512.EqualsAny, Vector512.EqualsAll, which don't return a mask at all, only a boolean (for both of them I got a comparison and kortestq, if the inputs are Vector512<byte>, followed by some branch or setcc depending on how the boolean is used)

    If you want the result of a comparison as a mask in an integer, you can combine eg Vector512.Equals with Vector512.ExtractMostSignificantBits. That doesn't result in pointlessly converting a mask to a vector then back to a mask, you get the right thing, I tried it and got this:

    vpcmpeqb    k1,zmm0,zmmword ptr [rax+50h]  
    kmovq       rax,k1