Search code examples
Check XMM register for all zeroes...


c++ssesimdintrinsics

Read More
OpenMP vectorised code runs way slower than O3 optimized code...


c++gccopenmpvectorizationsimd

Read More
No insert and extract for float/double in SSE and AVX?...


c++floating-pointssesimdavx

Read More
AVX-optimized addition of two vectors containing only 3 elements...


optimizationx86simdavx

Read More
How to load 16 bytes of memory into a Rust __m128i?...


rustssesimdintrinsics

Read More
How to speed up this histogram of LUT lookups?...


c++optimizationhistogramsimd

Read More
Vector double-double floating point arithmetic...


floating-pointvectorizationprecisionsimddouble-double-arithmetic

Read More
Why is SIMD slower than scalar counterpart...


assemblyx86ssesimd

Read More
How can I know whether my CPU shares the vector registers among the cores or each core has its priva...


multithreadingcpu-architecturesimdcpu-registersxeon-phi

Read More
When does data move around between SSE registers and the stack?...


c++ssesimdcpu-registersregister-allocation

Read More
VLD2 structure load of a stricter alignment type...


csimdintrinsicsmemory-alignmentneon

Read More
error: reduction variable is private in outer context (omp reduction)...


c++parallel-processingopenmpsimd

Read More
How to extract bytes from an SSE2 __m128i structure?...


cimage-processingvectorizationsimdsse2

Read More
Fast byte-wise replace if...


coptimizationx86ssesimd

Read More
Writing a vector sum function with SIMD (System.Numerics) and making it faster than a for loop...


c#arraysperformancesimdavx

Read More
shuffling upper 32 bits with lower 32 bits in m128...


cssesimdintrinsics

Read More
Fastest way to horizontally sum SSE unsigned byte vector...


c++x86ssesimd

Read More
SSE2 intrinsics - comparing unsigned integers...


c++x86ssesimdintrinsics

Read More
The correct way to sum two arrays with SSE2 SIMD in C++...


c++arrayssumssesimd

Read More
AVX-512 - How to gather data from memory using assembly instruction?...


c++assemblynasmsimdavx512

Read More
Fast counting the number of set bits in __m128i register...


cssesimdsse2hammingweight

Read More
Pack (with saturation) __m256i of 16-bit values to __m128i of 8-bit values?...


x86simdavxavx2

Read More
How to get AVX512 in C#?...


c#simdavxavx512

Read More
Convert "__m256 with random-bits" into float values of [0, 1] range...


c++randomfloating-pointsimdavx

Read More
String length function is unstable...


csimdmemory-alignmentavxstrlen

Read More
Searching for the key using SIMD...


csimdavx

Read More
how to set a int32 value at some index within an m128i with only SSE2?...


c++ssesimdintrinsicssse2

Read More
Load or shuffle a pair of floats with SIMD intrinsics for doubles?...


cssesimdintrinsicsavx

Read More
SIMD vectorization strategies for group-by operations on multiple, very large data arrays...


c#performancex86simdintrinsics

Read More
SIMD: Bit-pack signed integers...


ssesimdavxavx2avx512

Read More
BackNext