Search code examples
c++simdavxavx2logical-and

What is the fastest way to calculate the logical_and (&&) between elements of two __m256i variables, looking for any pair of non-zero elements


As far as I know, integers in C++ can be treated like booleans, and we can have a code like this:

int a = 6, b = 10;
if (a && b) do something ---> true as both a and b are non-zero

Now, assume that we have:

__m256i a, b;

I need to apply logical_and (&&) for all 4 long variables in __m256i, and return true if one pair is non-zero. I mean something like:

(a[0] && b[0]) || (a[1] && b[1]) || ...

Do we have a fast code in AVX or AVX2 for this purpose?

I could not find any direct instruction for this purpose, and definitely, using the bitwise and (&) also is not the same.


Solution

  • You can cleverly combine a vpcmpeqq with a vptest:

    __m256i mask = _mm256_cmpeq_epi64(a, _mm256_set1_epi64x(0));
    bool result = ! _mm256_testc_si256(mask, b);
    

    The result is true if and only if (~mask & b) != 0 or

    ((a[i]==0 ? 0 : -1) & b[i]) != 0 // for some i
    // equivalent to
    ((a[i]==0 ? 0 : b[i])) != 0      // for some i
    // equivalent to
    a[i]!=0 && b[i]!=0               // for some i
    

    which is equivalent to what you want.

    Godbolt-link (play around with a and b): https://godbolt.org/z/aTjx7vMKd

    If result is a loop condition, the compiler should of course directly do a jb/jnb instruction instead of setnb.