Search code examples
What's the point of _mm_cmpgt_sd and other similar methods?...

x86ssesimdintrinsics

Read More
What is the difference between _mm_movehdup_ps and _mm_shuffle_ps in this case?...

x86sseintrinsicsmicro-optimizationsse3

Read More
Loading into Array causes Stack Smashing while having enough space?...

c++intrinsicsavxavx512stack-smash

Read More
How do I efficiently reorder bytes of a __m256i vector (convert int32_t to uint8_t)?...

c++vectorizationsimdintrinsicsavx2

Read More
Compiler errors for GCC (via CUDA) intrinsic functions, but I'm not using any...

c++gcccompiler-errorscudaintrinsics

Read More
Summing vec4[idx[i]] * scalar[i] with YMM vector registers...

c++simdintrinsicsavx2

Read More
SSE: shuffle (permutevar) 4x32 integers...

ssesimdintrinsicsavx

Read More
Convert AoS to SoA in C using SIMD...

carraysstructsimdintrinsics

Read More
Most efficient way to check if all __m128i components are 0 [using <= SSE4.1 intrinsics]...

c++integerssesimdintrinsics

Read More
Unresolved external symbol __aullshr when optimization is turned off...

cvisual-c++intrinsicsbit-fieldsuefi

Read More
Segmentation fault (core dumped) when using avx on an array allocated with new[]...

c++11codeblocksintrinsicsavx

Read More
Missing AVX-512 intrinsics for masks?...

cgccintrinsicsiccavx512

Read More
__m256 unknown type (clang 5.1/i5 CPU)?...

c++x86clang++intrinsicsavx

Read More
How does dead code elimination of Math.log() work in JMH sample...

javaintrinsicsmicrobenchmarkjmh

Read More
Computing 8 horizontal sums of eight AVX single-precision floating-point vectors...

optimizationintrinsicsavxlow-level

Read More
cuda "rounding modes" of reciprocal functions...

apimathcudaintrinsics

Read More
Why does _mm_mfence() produce counts for the ALL_LOADS perf event?...

cx86intrinsicsperfpapi

Read More
How to detect rdtscp support in Visual C++?...

c++visual-c++x86intrinsicsrdtsc

Read More
What is the difference between loadu and load?...

assemblyx86ssesimdintrinsics

Read More
unresolved external symbol __mm256_setr_epi64x...

c++visual-studio-2012intrinsicsavxmsvc12

Read More
_mm_lfence() time overhead is non deterministic?...

cperformancex86intrinsicsrdtsc

Read More
How to move double in %rax into particular qword position on %ymm or %zmm? (Kaby Lake or later)...

c++x86-64inline-assemblyintrinsicsavx

Read More
FMA instruction showing up as three packed double operations?...

linear-algebraintrinsicsperf

Read More
Why __m256 instead of 'float' gives more than x8 performance?...

c++visual-c++compiler-optimizationsseintrinsics

Read More
How to floor/int in double using only SSE2?...

c++simdtruncateintrinsicssse2

Read More
What's the difference between __popcnt() and _mm_popcnt_u32()?...

x86sseintrinsicssse4

Read More
ARM SVE Left-to-right vs. tree reduction...

armintrinsicssve

Read More
What is the fastest way to convert a large c-array of char8 to short16?...

c++cintelintrinsics

Read More
How do you process exp() with SSE2?...

c++simdintrinsicssse2exp

Read More
Move an int64_t to the high quadwords of an AVX2 __m256i vector...

c++x86-64simdintrinsicsavx2

Read More
BackNext