Search code examples
c++armarm64stream-compactionsve

AArch64 SVE/2 - Left pack elements from list


I'm trying to implement a SIMD algorithm with AArch64 SVE (or SVE2) that takes a list of elements and only selects the ones that meet a certain condition. It's often called Left Packing (SSE/AVX/AVX-512), or Stream Compaction (CUDA)?

Is it possible to vectorize this operation with SVE?

Equivalent SQL and scalar code could look like this:

SELECT * FROM Book
WHERE Price < 42
int LeftPack_Scalar(int* input, int* output, int N, int limit)
{
    int outPos = 0;
    for (int i = 0; i < N; ++i) {
       if (input[i] < limit)
          out[outPos++] = input[i];
    }
    return outPos;
}

One could vectorize this in SIMD with AVX-512

int LeftPack_AVX512(int* input, int* output, int N, int limit)
{
    int outPos = 0;
    __m512i vlimit = mm512_load(limit);
    for (int i=0; i < N; i+=16) {
       __m512i vinput = mm512_load(input+i); 
       __mmask16 mask = mm512_cmp(vinput, vlimit); 
       mm512_mask_compressstore(output+outPos, mask, vinput);
       int count = mm512_popcnt(mask);
       outPos += count;
    }
    return outPos;
}

How can I implement it with AArch64 SVE? Is there a function similar to AVX-512 compress_store to compress sparse data?

int LeftPack_SVE(int* input, int* output, int N, int limit)
{
  // ...
}

Note: SVE has both gather and scatter, see the short introduction to the SVE for further details. But I couldn't find an equivalent SVE / 2 instruction to keep the relative order of the elements.


Solution

  • Use svcompact to compact the active elements, then a normal linear store to store the result.