Search code examples
Efficient AVX2 implementation of a 17x17-bit squaring operation with result truncation...


algorithmassemblybit-manipulationmicro-optimizationavx2

Read More
AVX2 repack an array of structs of 5 ints to structs of 7 ints, with the extra elements from other a...


c++simdavx2avx512

Read More
How to improve performance of a packed yuv to planar yuv conversion using avx2?...


c++x86-64simdavx2

Read More
How to fix a warning (ignoring attributes) with a `vector` of `__m256`...


c++c++17intrinsicsmemory-alignmentavx2

Read More
Why does _mm256_unpacklo "jump" a double-word and where does it says so in the documentati...


c++simdintrinsicsavx2

Read More
Best way to mask a single bit in AVX2?...


cx86simdavxavx2

Read More
Do all processors supporting AVX2 support F16C?...


x86x86-64simdavx2half-precision-float

Read More
What is the inverse of "_mm256_cvtepi16_epi32"...


x86g++intrinsicsavxavx2

Read More
AVX2: Get every second int32...


csimdavxavx2int32

Read More
I need more performance for int8 vector multiplication (Intel AVX-512)...


performancesimdavxavx2avx512

Read More
Counting 1 bits (population count) on large data using AVX-512 or AVX-2...


assemblyavx2avx512bitcountpopulation-count

Read More
Fallback implementation for conflict detection in AVX2...


c++x86intrinsicsavx2avx512

Read More
AVX2 / gcc: Improve CPU-level parallelism by using different registers...


gccvectorizationcpu-architecturesimdavx2

Read More
How to vectorise multiplication of an int8 array by an int16 constant, widening to int32 result arra...


cx86simdintrinsicsavx2

Read More
How to implement lane crossing logical bit-wise shift/rotate (left and right) in AVX2...


c++cbit-shiftavx2

Read More
Emulating byte-shifts on 32 bytes with AVX (lane-crossing)...


c++simdintrinsicssse2avx2

Read More
AVX 32-bit integer to double precision float best practice...


avxavx2

Read More
Differences between AVX and AVX2...


x86matrix-multiplicationsimdavxavx2

Read More
How to reorder interleaved 8-bit values across AVX2 lanes efficiently?...


c++avx2

Read More
C++ to C# memory alignment issue...


c#c++memory-managementsimdavx2

Read More
AVX2 integer shuffle with types other than byte?...


c#avxavx2

Read More
How to understand this AVX addition of two _m256i variables?...


c++vectoravxavx2avx512

Read More
Shifting SSE/AVX registers 32 bits left and right while shifting in zeros...


x86ssesimdavxavx2

Read More
AVX2 what is the most efficient way to pack left based on a mask?...


c++vectorizationssesimdavx2

Read More
extract non-zero elements from __m512i/__m256i vector...


simdintrinsicsavx2avx512

Read More
Why does msvc not vectorize?...


visual-c++x86-64vectorizationavx2auto-vectorization

Read More
AVX2 code to find the first longest match of 4-byte string among 8 4-byte targets...


bit-manipulationsimdavxavx2lz77

Read More
How to perform parallel addition using AVX with carry (overflow) fed back into the same element (PE ...


csimdavxavx2avx512

Read More
Why does '_mm256_fmadd_ps' cause precision loss?...


cprecisionavxavx2fma

Read More
6-bit lookup using SIMD AVX2...


c++rustsimdavx2

Read More
BackNext