Search code examples
Shuffling a vector by number of bytes...

c++x86sseintrinsicsavx

Read More
why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?...

roundingmultiplicationsimdsse

Read More
Loading XMM registers from address location...

c++assemblyssecpu-registers

Read More
What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructio...

c++assemblysseavxavx2

Read More
Can counting byte matches between two strings be optimized using SIMD?...

c++optimizationx86-64ssesimd

Read More
Extract the low bit of each bool byte in a __m128i? bool array to packed bitmap...

c++gccsseintrinsics

Read More
What does "SSE 4.2 insanity" mean in the "if consteval" proposal paper?...

c++ssec++23sse4

Read More
SSE 4.2: alternative to _mm_cmpistri...

c++ssesse4

Read More
Why does __m128 cause alignment issues in a union with float x/y/z?...

csimdsseunionsmemory-alignment

Read More
Most insanely fast way to convert 9 char digits into an int or unsigned int...

c++assemblyoptimizationx86-64sse

Read More
Get SSE version without __asm on x64...

c++assemblyvisual-c++ssecpuid

Read More
Optimizing variable-length encoding...

c++cassemblyssevarint

Read More
QWORD shuffle sequential 7-bits to byte-alignment with SIMD SSE...AVX...

bit-manipulationsimdsseavxvarint

Read More
Out-of-range floating point to integer conversion breaks in VS2022 executable when linking VS2017 or...

cvisual-c++floating-pointssefloating-point-conversion

Read More
How to check if even/odd lanes are in given ranges using SIMD?...

x86simdsse

Read More
XMM register 0 not being used in Intel instruction documentation...

assemblyx86intelsse

Read More
Semantics of mov widths in x64 and SSE...

assemblyx86-64ssefreepascal

Read More
_mm_comieq_ss difference between Clang and GCC...

c++gccclangsimdsse

Read More
Estimating Cycles Per Instruction...

performanceassemblyarchitecturex86sse

Read More
Mixing SSE with AVX128 for shorter instructions?...

assemblyx86sseavxmicro-optimization

Read More
Meaning of XMM register values shown in Visual Studio debugger's register window...

visual-studiossevisual-studio-debuggingcpu-registers

Read More
Fast CRC with PCLMULQDQ *NOT* reflected...

assemblyssecrccrc32

Read More
SSE multiplication 16 x uint8_t...

x86ssesimdsse4

Read More
Horizontal minimum and maximum using SSE...

c++maxsseminimumavx

Read More
How to display AVX registers as doubles with GDB?...

gdbsimdssecpu-registersavx

Read More
How to calculate 2x2 matrix multiplied by 2D vector using SSE intrinsics (32 bit floating points)? (...

c++optimizationmatrix-multiplicationsseintrinsics

Read More
Getting max value in a __m128i vector with SSE?...

cassemblyx86sse

Read More
Fast pyrDown image with AVX instructions...

c++image-processingcomputer-visionsseavx

Read More
How to enable SSE3 addsubps autovectorization for complex numbers in gcc?...

cgccssecomplex-numbersauto-vectorization

Read More
How to dump all the XMM registers in gdb?...

x86gdbsimdssecpu-registers

Read More
BackNext