My goal is to vectorize comparisons to use them as a masks in the future.
The problem is that _mm256_cmp_pd returns NaN instead of 1.0. What is the correct way to do comparisons in AVX2?
AVX2 code:
__m256d _numberToCompare = _mm256_set1_pd(1.0);
__m256d _compareConditions = _mm256_set_pd(0.0, 1.0, 2.0, 3.0);
__m256d _result = _mm256_cmp_pd(_numberToCompare, _compareConditions, _CMP_LT_OQ); //a < b ordered (non-signalling)
alignas(8) double res[4];
_mm256_store_pd(&res[0], _result);
for (auto i : res) {
std::cout << i << '\t';
}
__m256d _result2 = _mm256_cmp_pd(_numberToCompare, _compareConditions, _CMP_LE_OQ); //a <= b ordered (non-signalling)
alignas(8) double res2[4];
_mm256_store_pd(&res2[0], _result2);
for (auto i : res2) {
std::cout << i << '\t';
}
std::cout << '\n';
Expected result (one I would have in scalar code):
0 0 1 1
0 1 1 1
Actual result:
-nan -nan 0 0
-nan -nan -nan 0
Ad 1: The result is a bitmask (in binary 0xffff'ffff'ffff'ffff
for true or 0
for false) which can be used with bitwise operators.
Ad 2: You can compute _result = _mm256_and_pd(_result, _mm256_set1_pd(1.0))
if you really want 1
and 0
(but usually, using the bitmask directly is more efficient).
Also be aware that _mm256_set_pd
takes the arguments in "big-endian" order, i.e., the element with the highest address is the first argument (don't ask me why Intel decided that way) -- you can use _mm256_setr_pd
instead if you prefer little-endian.