Search code examples
simdpowerpcaltivec

On Powerpc, is there any equivalent of intel's movemask intrinsics?


I'd like to merge all elements in a __vector bool long long into a single int, in which each bit is set to the most significant bit of the input vector

example:

__vector bool long long vcmp = vec_cmplt(a, b);
int packedmask = /*SOME FUNCTION GOES HERE*/ (vcmp);

with

packedmask = x|y|0000000000000000....

where x equals 1 if vcmd[0] = 0XFFFFF... or 0 if vcmp[0] = 0; same for y.

On intel , we can achieve this by using _mm_movemask instructions (intrinsic for intel)

Is there any way to do the same on PowerPC?

Thank you for any help


Solution

  • Sounds like the the vbpermq instruction (and vec_vbpermq() intrinsic) would be appropriate here. Given a vector of unsigned char "indicies" (ie., 0 - 128), it uses those indexes to select a bit into an output vector. If the index is greater than 128, a zero bit is used instead.

    The 16 resulting bits are zero-extended to form a 64-bit value in the first doubleword of the result vector.

    Something like this could work:

    /*
     * our permutation indicies: the MSbit from the first bool long long,
     * then the MSbit from the second bool long long, then the rest as
     * >=128 (which gives a zero bit in the result vector)
     */
    vector unsigned char perm = { 0, 64, 128, 128, 128, /*...*/};
    
    /* compare the two-item vector into two bools */
    vcmp = (vector unsigned char)vec_cmplt(a, b);
    
    /* select a bit from each of the result bools */
    result = vec_vbpermq(vcmp, perm);
    

    Getting the int out of the result vector will depend on what you want to do with it. If you need that as is, a vec_extract(result, 0) might work, but since you're only interested in the top two bits of the result, you may be able to simplify the perm constant, and/or shift the result as appropriate.

    Also, be aware of endian considerations of your result.

    vbpermq is described in section 5.15 of the PowerISA.