Search code examples
Is it possible to popcount __m256i and store result in 8 32-bit words instead of the 4 64-bit using ...

c++intelsseavxavx2

Read More
Does gcc use Intel's SSE 4.2 instructions for text processing if available?...

c++cgccssesimd

Read More
count number of unique values in a 128bit avx vector, or detecting if all elements are equal?...

csimdsseintrinsicsavx

Read More
Slow SIMD performance - no inlining...

rustsimdsseavx2

Read More
Count number of matching bytes between two _m128i SIMD vectors...

c++bioinformaticsssesimdhamming-distance

Read More
What is the difference between MOVDQA and MOVNTDQA, and VMOVDQA and VMOVNTDQ for WB/WC marked region...

assemblyx86ssesimdavx

Read More
Why does does SSE set (_mm_set_ps) reverse the order of arguments...

c++csimdsseintrinsics

Read More
Accessing the fields of a __m128i variable in a portable way...

simdsse

Read More
Replace `movss xmm0, cs:dword_5B27420` with `movss xmm0, immediate`...

assemblyx86reverse-engineeringsseimmediate-operand

Read More
Where can I find an official reference listing the operation of SSE intrinsic functions?...

c++cgccssesimd

Read More
128-bit values - From XMM registers to General Purpose...

assemblyx86sse

Read More
Using ymm registers as a "memory-like" storage location...

assemblyx86sseavx

Read More
What instruction set does SFENCE belong to?...

assemblyx86-64sseamd-processormmx

Read More
Why does Clang complain about alignment on SSE intrinsic unaligned loads...

clangsseintrinsicsmemory-alignment

Read More
SSE SSE2 and SSE3 for GNU C++...

c++optimizationsimdssesse2

Read More
Why does MSVC use SSE2 instruction for such trivial thing?...

optimizationvisual-c++x86ssefpu

Read More
What is the minimum supported SSE flag that can be enabled on macOS?...

c++macosoptimizationcompiler-optimizationsse

Read More
How do I more efficiently multiply A*B^T or A^T*B^T (T for transpose) matrices using SSE?...

c++ctransposematrix-multiplicationsse

Read More
How does strncmp using SSE 4.2 avoid reading beyond the page boundaries when loading 16 bytes?...

memoryx86valgrindsseglibc

Read More
How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?...

gccclangsseavxavx512

Read More
x64 logical AND of packed 32 bit floating points...

assemblyx86sse

Read More
simd: round up (ceil) the log2 of an input, while clamping negative logs to zero?...

c++roundingsimdsseunsigned

Read More
What are the names and meanings of the intrinsic vector element types, like epi64x or pi32?...

intelsseintrinsicssse2mmx

Read More
How to constexpr initialize intrinsic SSE/AVX register?...

c++sseconstexprintrinsicsavx

Read More
Is there a way to cast integers to bytes, knowing these ints are in range of bytes. Using SSE?...

assemblyx86-64masmssesse4

Read More
What is the difference between these 128bit SIMD xor operations...

simdsseintrinsicssse2

Read More
sse2 instruction set not enabled...

g++sse

Read More
Find largest element in matrix and its column and row indexes using SSE and AVX...

c++matrixsseavxavx2

Read More
Why doesn't gcc zero the upper values of an XMM register when only using the lower value with SS...

cassemblyx86ssecalling-convention

Read More
_mm_load_ps caused segment fault...

c++x86ssesimdmemory-alignment

Read More
BackNext