Search code examples
How does SIMD (avx) processing work? for example, if I want 10 32 bit floats how do i fit in a 256 b...

csimdavx

Read More
why is my simd vector plus and set slower than using std::transform and std::plus<T> - am i do...

c++vectorvectorizationsimdavx

Read More
SSE4.1 slower than SSE3 on 4x4 matrix multiplication?...

c++matrixsimdssematmul

Read More
Why does _mm256_unpacklo "jump" a double-word and where does it says so in the documentati...

c++simdintrinsicsavx2

Read More
Does SSE/AVX provide a means of determining if a result was rounded up?...

x86roundingssesimdavx

Read More
Are SIMD and VLIW instructions the same thing?...

x86cpu-architecturesimdinstruction-setvliw

Read More
Pack high bit of every byte in ARM, for 64 bytes like AVX512 vpmovb2m?...

carmsimdarm64neon

Read More
SIMD load across memory boundary doesn't cause segfault?...

c++segmentation-faultundefined-behaviorsimdintrinsics

Read More
Best way to mask a single bit in AVX2?...

cx86simdavxavx2

Read More
Do all processors supporting AVX2 support F16C?...

x86x86-64simdavx2half-precision-float

Read More
Is there a way to convert an integer to 1 if it is >= 1 without using any relational operator?...

cmathbooleanlogical-operatorssimd

Read More
How to efficiently perform double/int64 conversions with SSE/AVX?...

c++floating-pointssesimdavx

Read More
AVX2: Get every second int32...

csimdavxavx2int32

Read More
Storing and retrieving number from C++23 experimental simd gives random result...

c++simdc++23c++-experimental

Read More
invert a FloatVector (1/each element)...

javasimd

Read More
How to avoid if statement? for the compiler cannot optimize it to simd...

cif-statementvisual-studio-2012simdauto-vectorization

Read More
How to optimize cell-width measuring with SIMD (find the first column to have a non-zero byte in an ...

cx86-64simdsseavx

Read More
I need more performance for int8 vector multiplication (Intel AVX-512)...

performancesimdavxavx2avx512

Read More
Is worth using SSE or should I just rely on the compiler?...

c++optimizationintelsimdsse

Read More
Accelerating matrix vector multiplication with ARM Neon Intrinsics on Raspberry Pi 4...

c++raspberry-piarmsimdneon

Read More
Generate FMOV without inline assembly...

clangsimdarm64micro-optimizationsve

Read More
Failed to use GNU MIPS builtin functions of vector (SIMD)...

cmipsgnusimdintrinsics

Read More
AVX2 / gcc: Improve CPU-level parallelism by using different registers...

gccvectorizationcpu-architecturesimdavx2

Read More
Accumulate vector using Neon and print to stdout (assembly)...

assemblysimdarm64neonapple-silicon

Read More
Why does .NET use SIMD and not x87 for math operations not intrinsic to SIMD?...

.netassemblysimdssex87

Read More
Is batching same functions with SIMD instruction possible?...

gccx86llvmsimd

Read More
How to vectorise multiplication of an int8 array by an int16 constant, widening to int32 result arra...

cx86simdintrinsicsavx2

Read More
Emulating byte-shifts on 32 bytes with AVX (lane-crossing)...

c++simdintrinsicssse2avx2

Read More
What is the difference between shuffle and permute...

x86intelsimdnamingavx

Read More
Differences between AVX and AVX2...

x86matrix-multiplicationsimdavxavx2

Read More
BackNext