Search code examples
c++x86comparisonavxavx2

AVX2 _mm256_cmp_pd to return number values


My goal is to vectorize comparisons to use them as a masks in the future.

The problem is that _mm256_cmp_pd returns NaN instead of 1.0. What is the correct way to do comparisons in AVX2?

AVX2 code:

__m256d _numberToCompare = _mm256_set1_pd(1.0);
__m256d _compareConditions = _mm256_set_pd(0.0, 1.0, 2.0, 3.0);

__m256d _result = _mm256_cmp_pd(_numberToCompare, _compareConditions, _CMP_LT_OQ); //a < b ordered (non-signalling) 
alignas(8) double res[4];
_mm256_store_pd(&res[0], _result);
for (auto i : res) {
    std::cout << i << '\t';
}
   
__m256d _result2 = _mm256_cmp_pd(_numberToCompare, _compareConditions, _CMP_LE_OQ); //a <= b ordered (non-signalling)   
alignas(8) double res2[4];
_mm256_store_pd(&res2[0], _result2);
for (auto i : res2) {
    std::cout << i << '\t';
}
std::cout << '\n';

GodBolt link

Expected result (one I would have in scalar code):

0 0 1 1
0 1 1 1

Actual result:

-nan    -nan    0       0
-nan    -nan    -nan    0
  1. Why result of comparison is NaN?
  2. What is the correct way to get expected result?

Solution

  • Ad 1: The result is a bitmask (in binary 0xffff'ffff'ffff'ffff for true or 0 for false) which can be used with bitwise operators.

    Ad 2: You can compute _result = _mm256_and_pd(_result, _mm256_set1_pd(1.0)) if you really want 1 and 0 (but usually, using the bitmask directly is more efficient).

    Also be aware that _mm256_set_pd takes the arguments in "big-endian" order, i.e., the element with the highest address is the first argument (don't ask me why Intel decided that way) -- you can use _mm256_setr_pd instead if you prefer little-endian.