Search code examples
Squared Quaternion using AVX...

optimizationvectorizationquaternionsavx

Read More
AVX2 code to find the first longest match of 4-byte string among 8 4-byte targets...

bit-manipulationsimdavxavx2lz77

Read More
How to perform parallel addition using AVX with carry (overflow) fed back into the same element (PE ...

csimdavxavx2avx512

Read More
Is there an ARM Neon Gather Instruction?...

c++armsimdavxneon

Read More
Why does '_mm256_fmadd_ps' cause precision loss?...

cprecisionavxavx2fma

Read More
Unknown type name __m256 - Intel intrinsics for AVX not recognized?...

c++cintelintrinsicsavx

Read More
AVX MaskLoad/MaskStore performance...

c#simdavx

Read More
gcc: Optimize single function with `-mavx -mprefer-avx128`...

cgcccompiler-optimizationavx

Read More
AVX2 consuming bytes whilst producing uints?...

c#simdintrinsicsavx

Read More
FLOPs per cycle for Sandy Bridge and Haswell and others SSE2 / AVX / AVX2 / AVX-512...

cpuintelcpu-architectureavxflops

Read More
Compiling legacy GCC code with AVX vector warnings...

c++gccavx

Read More
Unpacking nibbles to bytes - Direct instructions/ Efficient Way to implement and keep sign...

c++simdavxavx2sign-extension

Read More
Horizontal XOR in AVX...

c++assemblyx86simdavx

Read More
Comparing Unsigned integers using AVX2 Intrinsics...

c++assemblyintrinsicsavxavx2

Read More
Using a variable to index a simd vector with _mm256_extract_epi32() intrinsic...

simdintrinsicsavxavx2

Read More
Set Last Value in __m128 vector register...

c++simdsseavx

Read More
Is there anything more I need to do before using SSE instructions?...

assemblyx86simdsseavx

Read More
Can std::sort, std::accumulate, std::memcpy be vectorized because of -mavx / -mavx2 flag?...

c++x86vectorizationavxavx2

Read More
bitwise shift in AVX512...

c++optimizationintrinsicsavxavx512

Read More
How can I optimize search in small fixed size array?...

rustvectorizationavx

Read More
How does MSVC avoid mixing SSE and AVX?...

c++visual-c++sseavx

Read More
How to solve the 32-byte-alignment issue for AVX load/store operations?...

c++ssesimdmemory-alignmentavx

Read More
Can std::replace implementation make redundant writes to the passed array?...

c++language-lawyervectorizationsseavx

Read More
AVX2 vectorization for code similar to prefix sum (decrement by count of preceding matches in short ...

simdavxbitmaskavx2prefix-sum

Read More
Is using AVX2 can implement a faster processing of LZCNT on a word array?...

x86simdavxmicro-optimizationavx2

Read More
Fastest way to mask out bytes higher than separator position with SIMD...

c++assemblyoptimizationsimdavx

Read More
Getting Illegal Instruction while running a basic Avx512 code...

c++x86avxinstruction-setavx512

Read More
Is there an efficient way to get the first non-zero element in an SIMD register using SIMD intrinsic...

x86bit-manipulationsimdintrinsicsavx

Read More
What makes numpy.sum faster than an optimized (auto-vectorized) C loop?...

cnumpyfloating-pointcompiler-optimizationavx

Read More
Do all CPUs which support AVX2 also support SSE4.2 and AVX?...

ssesimdavxavx2

Read More
BackNext