Search code examples
Seg fault while using _mm256_i64gather_pd...

c++intrinsicsavxavx2

Read More
perf report shows this function "__memset_avx2_unaligned_erms" has overhead. does this mea...

c++profilingavxperfavx2

Read More
Xcode Apple Clang enable avx512...

xcodeclangavxavx2avx512

Read More
How to get data out of AVX registers?...

c++visual-c++avxfma

Read More
Is it safe to compile one source with SSE2 another with AVX architecture?...

visual-c++sseintrinsicsavx

Read More
Shuffling a vector by number of bytes...

c++x86sseintrinsicsavx

Read More
why does gcc auto-vectorization for tigerlake use ymm not zmm registers...

cgccavxavx512auto-vectorization

Read More
What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructio...

c++assemblysseavxavx2

Read More
SIMD Intrinsics AVX. Tried to use _mm256_mullo_epi64. But got 0xC000001D: Illegal Instruction except...

c++exceptionsimdavxavx2

Read More
Disabling AVX2 in CPU for testing purposes...

testingx86avxinstruction-setavx2

Read More
AV512: Best way to combine horizontal sum and broadcast...

cintelavxavx512

Read More
ASM x86_64 AVX: xmm and ymm registers differences...

assemblynasmx86-64avx

Read More
Unable to return multiple SIMD vectors using vectorcall...

c++clangx86-64avxcalling-convention

Read More
QWORD shuffle sequential 7-bits to byte-alignment with SIMD SSE...AVX...

bit-manipulationsimdsseavxvarint

Read More
Convert 128 bit AVX register with 8-bit elements to two 256 bit registers with 32-bit elements...

performancex86simdavxavx2

Read More
C++ compilers give different signs of NaN for constant propagation of subtracting +-Infinity or +-Na...

c++gccclangnanavx

Read More
Fastest way to implement _mm256_mullo_epi4 using AVX2...

cx86-64intrinsicsavxavx2

Read More
Simple AVX512 dot-product loop only 10.6x faster, expected 16x...

c++performanceavxdot-productavx512

Read More
How can I exchange the low 128 bits and high 128 bits in a 256 bit AVX (YMM) register...

x86simdavx

Read More
Which contexts need to be saved in x86-64 with a c function return?...

cx86-64avxabicontext-switch

Read More
L1 Cache Usage in Optimised matrix multiplication micro-kernel in C++...

c++optimizationmatrix-multiplicationavxcpu-cache

Read More
What is the fastest way to calculate the logical_and (&&) between elements of two __m256i va...

c++simdavxavx2logical-and

Read More
How to load 128bit data to ymm register in assembly?...

assemblyx86avxavx2

Read More
Mixing SSE with AVX128 for shorter instructions?...

assemblyx86sseavxmicro-optimization

Read More
Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?...

performanceassemblyx86avxmicro-optimization

Read More
Can I check the values of XMM or YMM registers in Visual C++ breakpoint conditions?...

visual-studiomasmvisual-studio-debuggingavx

Read More
Fastest Implementation of Exponential Function Using AVX...

x86simdavxexponentialavx2

Read More
Horizontal minimum and maximum using SSE...

c++maxsseminimumavx

Read More
How to display AVX registers as doubles with GDB?...

gdbsimdssecpu-registersavx

Read More
Can Apache web server make use of CPU AVX instructions?...

performanceapacheavx

Read More
BackNext