Search code examples
How to use Fused Multiply-Add (FMA) instructions with SSE/AVX...

cssecpu-architectureavxfma

Read More
Does SSE/AVX provide a means of determining if a result was rounded up?...

x86roundingssesimdavx

Read More
Best way to mask a single bit in AVX2?...

cx86simdavxavx2

Read More
How to efficiently perform double/int64 conversions with SSE/AVX?...

c++floating-pointssesimdavx

Read More
What is the inverse of "_mm256_cvtepi16_epi32"...

x86g++intrinsicsavxavx2

Read More
AVX2: Get every second int32...

csimdavxavx2int32

Read More
How to optimize cell-width measuring with SIMD (find the first column to have a non-zero byte in an ...

cx86-64simdsseavx

Read More
I need more performance for int8 vector multiplication (Intel AVX-512)...

performancesimdavxavx2avx512

Read More
Efficient way for using int8 AVX512-VNNI instruction, especially about loading the data to zmm regis...

performanceintelmatrix-multiplicationavxavx512

Read More
AVX 32-bit integer to double precision float best practice...

avxavx2

Read More
Have I written these sha256 #define's the correct way?...

calgorithmsha256avxsha2

Read More
What is the difference between shuffle and permute...

x86intelsimdnamingavx

Read More
Load and duplicate 4 single precision float numbers into a packed __m256 variable with fewest instru...

c++avx

Read More
Differences between AVX and AVX2...

x86matrix-multiplicationsimdavxavx2

Read More
Is this a gcc bug? Function returns 0 when looping an int* over elements of a __m256i...

cgccx86intrinsicsavx

Read More
SIMD: Accumulate Adjacent Pairs...

c++ssesimdintrinsicsavx

Read More
AVX512 assembly breaks when called concurrently from different goroutines...

goassemblyavxavx512

Read More
What are the best instruction sequences to generate vector constants on the fly?...

assemblyx86ssesimdavx

Read More
AVX2 integer shuffle with types other than byte?...

c#avxavx2

Read More
Why doesn't gcc resolve _mm256_loadu_pd as single vmovupd?...

assemblygcccompiler-optimizationsimdavx

Read More
How to understand this AVX addition of two _m256i variables?...

c++vectoravxavx2avx512

Read More
Emulate AVX512 VPCOMPRESSB byte packing without AVX512_VBMI2...

x86-64simdavxavx512

Read More
Shifting SSE/AVX registers 32 bits left and right while shifting in zeros...

x86ssesimdavxavx2

Read More
Why gcc is so much worse at std::vector<float> vectorization of a conditional multiply than cl...

c++gccvectorizationcompiler-optimizationavx

Read More
Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?...

coptimizationsseavxauto-vectorization

Read More
How to run bitwise OR on big vectors of u64 in the most performant manner?...

c++performanceassemblycpuavx

Read More
Using SIMD To Parallelize Matrix Multiplication For A 4x4, Row-Major Matrix...

cmatrix-multiplicationintrinsicsavx

Read More
AVX Intrinsic Clarification, 4x4 Matrix Multiplication Oddities...

c++cmatrix-multiplicationavx

Read More
Converting between Pair-wise and Component-wise in AVX...

csimdavxdouble-double-arithmetic

Read More
Squared Quaternion using AVX...

optimizationvectorizationquaternionsavx

Read More
BackNext