Search code examples
Why is the generated assembly reordered when using intrinsics?...


cgccx86sseintrinsics

Read More
Auto-vectorizing: Convincing the compiler that alias check is not necessary...


c++opencvgccvectorizationsse

Read More
Is there a difference between SVML vs. normal intrinsic square root functions?...


c++intelsseintrinsicssse2

Read More
Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? O...


gccassemblyx86sseavx

Read More
In GNU C inline asm, what are the size-override modifiers for xmm/ymm/zmm for a single operand?...


cgccsseinline-assemblyavx512

Read More
Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math...


c++ssecompiler-optimizationsimdfast-math

Read More
Why do SSE instructions preserve the upper 128-bit of the YMM registers?...


performancex86simdsseavx

Read More
How many clock cycles does cost AVX/SSE exponentiation on modern x86_64 CPU?...


c++x86x86-64sseavx

Read More
How to best emulate the logical meaning of _mm_slli_si128 (128-bit bit-shift), not _mm_bslli_si128...


cssesimdintrinsicssse2

Read More
Logarithm with SSE, or switch to FPU?...


ssesimdlogarithmnatural-logarithm

Read More
parallel prefix (cumulative) sum with SSE...


csumopenmpsse

Read More
How to compute sine values somewhere, and then move them into XMM0 in assembly?...


assemblyx86ssex87fpu

Read More
Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?...


coptimizationsseavxauto-vectorization

Read More
How to use Fused Multiply-Add (FMA) instructions with SSE/AVX...


cssecpu-architectureavxfma

Read More
SSE4.1 slower than SSE3 on 4x4 matrix multiplication?...


c++matrixsimdssematmul

Read More
Does SSE/AVX provide a means of determining if a result was rounded up?...


x86roundingssesimdavx

Read More
Write access violation on read instruction (MOVQ load on old Athlon XP)...


visual-c++x86sseamd-processorsse2

Read More
What series of intrinsics will complete this paeth prediction code?...


c++sseintrinsics

Read More
Calculating constants for CRC32 using PCLMULQDQ...


ssecrc32modular-arithmeticgalois-field

Read More
Classification of x86 instructions according to floating point rounding mode sensitivity?...


assemblyfloating-pointx86-64sserounding-error

Read More
Why do x86 FP compares set CF like unsigned integers, instead of using signed conditions?...


assemblyx86ssesse2x87

Read More
Intel x86_64 assembly compare signed double precision floats...


assemblyx86-64intelprecisionsse

Read More
How to efficiently perform double/int64 conversions with SSE/AVX?...


c++floating-pointssesimdavx

Read More
Is there a way to utilize all XMM registers?...


c++cssecpu-registers

Read More
Output errors when using libmvec intrinsics for trigo functions manually (like cosf)...


c++gccglibcsseintrinsics

Read More
How to optimize cell-width measuring with SIMD (find the first column to have a non-zero byte in an ...


cx86-64simdsseavx

Read More
Is worth using SSE or should I just rely on the compiler?...


c++optimizationintelsimdsse

Read More
Accelerate CRC32b using intel processors...


x86intelssecrc32

Read More
Why does .NET use SIMD and not x87 for math operations not intrinsic to SIMD?...


.netassemblysimdssex87

Read More
Why is SSE4.2 cmpstr slower than regular code?...


cperformanceassemblyx86sse

Read More
BackNext