avx2 Examples and Free Source Code

Efficient AVX2 implementation of a 17x17-bit squaring operation with result truncation...

algorithm assembly bit-manipulation micro-optimization avx2

AVX2 repack an array of structs of 5 ints to structs of 7 ints, with the extra elements from other a...

c++simd avx2 avx512

How to improve performance of a packed yuv to planar yuv conversion using avx2?...

c++x86-64 simd avx2

How to fix a warning (ignoring attributes) with a `vector` of `__m256`...

c++c++17 intrinsics memory-alignment avx2

Why does _mm256_unpacklo "jump" a double-word and where does it says so in the documentati...

c++simd intrinsics avx2

Best way to mask a single bit in AVX2?...

c x86 simd avx avx2

Do all processors supporting AVX2 support F16C?...

x86 x86-64 simd avx2 half-precision-float

What is the inverse of "_mm256_cvtepi16_epi32"...

x86 g++intrinsics avx avx2

AVX2: Get every second int32...

c simd avx avx2 int32

I need more performance for int8 vector multiplication (Intel AVX-512)...

performance simd avx avx2 avx512

Counting 1 bits (population count) on large data using AVX-512 or AVX-2...

assembly avx2 avx512 bitcount population-count

Fallback implementation for conflict detection in AVX2...

c++x86 intrinsics avx2 avx512

AVX2 / gcc: Improve CPU-level parallelism by using different registers...

gcc vectorization cpu-architecture simd avx2

How to vectorise multiplication of an int8 array by an int16 constant, widening to int32 result arra...

c x86 simd intrinsics avx2

How to implement lane crossing logical bit-wise shift/rotate (left and right) in AVX2...

c++c bit-shift avx2

Emulating byte-shifts on 32 bytes with AVX (lane-crossing)...

c++simd intrinsics sse2 avx2

AVX 32-bit integer to double precision float best practice...

avx avx2

Differences between AVX and AVX2...

x86 matrix-multiplication simd avx avx2

How to reorder interleaved 8-bit values across AVX2 lanes efficiently?...

c++avx2

C++ to C# memory alignment issue...

c#c++memory-management simd avx2

AVX2 integer shuffle with types other than byte?...

c#avx avx2

How to understand this AVX addition of two _m256i variables?...

c++vector avx avx2 avx512

Shifting SSE/AVX registers 32 bits left and right while shifting in zeros...

x86 sse simd avx avx2

AVX2 what is the most efficient way to pack left based on a mask?...

c++vectorization sse simd avx2

extract non-zero elements from __m512i/__m256i vector...

simd intrinsics avx2 avx512

Why does msvc not vectorize?...

visual-c++x86-64 vectorization avx2 auto-vectorization

AVX2 code to find the first longest match of 4-byte string among 8 4-byte targets...

bit-manipulation simd avx avx2 lz77

How to perform parallel addition using AVX with carry (overflow) fed back into the same element (PE ...

c simd avx avx2 avx512

Why does '_mm256_fmadd_ps' cause precision loss?...

c precision avx avx2 fma

6-bit lookup using SIMD AVX2...

c++rust simd avx2