Search code examples
sseavxavx2

SSE vector realign?


Is there a way to realign data that has been loaded into SSE/AVX vector registers (say to implement a sliding window)? Or do I need to shift the bytes myself and reload into vector registers from memory again?


Solution

  • For 128-bit vectors, SSSE3 / AVX [v]palignr xmm works for arbitrary byte-windows on a pair of registers. For AVX2 ymm registers, the 2x 128-bit lane behaviour is nearly useless for this. _mm_alignr_epi8 (PALIGNR) equivalent in AVX2

    Sometimes reloading from memory is better, though: 2/clock load throughput with no penalty if you don't cross a cache-line boundary (on Intel) vs. 1/clock shuffle throughput. And the throughput / latency penalty for cache-line splits isn't terrible. If one palignr is sufficient, usually use it, but it's usually better to do unaligned loads instead of trying to emulate it for AVX2.