Search code examples
Fastest way to horizontally sum SSE unsigned byte vector...

c++x86ssesimd

Read More
Convert 16 bits mask to 16 bytes mask...

c++cbit-manipulationsseintrinsics

Read More
SSE2 intrinsics - comparing unsigned integers...

c++x86ssesimdintrinsics

Read More
"Instruction operands must be the same size" for MOVDQU from .data array...

assemblyx86masmsse

Read More
The correct way to sum two arrays with SSE2 SIMD in C++...

c++arrayssumssesimd

Read More
Fast counting the number of set bits in __m128i register...

cssesimdsse2hammingweight

Read More
Why movaps causes segmentation fault?...

assemblysegmentation-faultssememory-alignmentatt

Read More
how to set a int32 value at some index within an m128i with only SSE2?...

c++ssesimdintrinsicssse2

Read More
Load or shuffle a pair of floats with SIMD intrinsics for doubles?...

cssesimdintrinsicsavx

Read More
SIMD: Bit-pack signed integers...

ssesimdavxavx2avx512

Read More
How to read optimally from an array (in memory) having array position from a vector?...

c++arraysperformancessesimd

Read More
Add a constant value to a xmm register in x86...

assemblyx86ssex87

Read More
cmpeqpd sometimes returns wrong values...

assemblyfloating-pointsseavxdenormal-numbers

Read More
GCC access memory above stack top...

assemblygccx86-64ssered-zone

Read More
Parallelizing inner loop with residual calculations in OpenMP with SSE vectorization...

copenmpssepragma

Read More
First use of AVX 256-bit vectors slows down 128-bit vector and AVX scalar ops...

assemblyx86-64ssesimdavx

Read More
Efficient sse shuffle mask generation for left-packing byte elements...

performancex86sseshufflesimd

Read More
what is difference between *(__m128*)(&A) and (__m128)A...

c++ssesimd

Read More
_mm_max_ss has different behavior between clang and gcc...

c++gccx86clangsse

Read More
_mm_load_si128 loads data in reverse order...

cssesimdsse2

Read More
Gcc misoptimises sse function...

c++gccsseintrinsicsstrict-aliasing

Read More
How to convert scalar code of the double version of VDT's Pade Exp fast_ex() approx into SSE2?...

c++sseintrinsicssse2exp

Read More
What is the "correct" way to go from avx/sse masks to avx512 masks?...

c++sseavxavx512

Read More
SSE Compare Packed Unsigned Bytes...

x86comparisonunsignedsse

Read More
How to best emulate the logical meaning of _mm_slli_si128 (128-bit bit-shift), not _mm_bslli_si128...

cssesimdintrinsicssse2

Read More
What does ordered / unordered comparison mean?...

assemblyx86floating-pointsse

Read More
Count integers in an array where the set bits are a subset of a given mask...

c++optimizationsseavxbitmask

Read More
Which are the use case of punpcklbw (interleave in MMX/SSE/AVX)?...

assemblycompressionssedisassemblymemset

Read More
Better way to store or extract scalar int result using SSE2 intrinsic...

csseintrinsicssse2

Read More
What is packed and unpacked and extended packed data...

cpu-architecturessesimdavxavx2

Read More
BackNext