Search code examples
Fallback implementation for conflict detection in AVX2...


c++x86intrinsicsavx2avx512

Read More
Efficient way for using int8 AVX512-VNNI instruction, especially about loading the data to zmm regis...


performanceintelmatrix-multiplicationavxavx512

Read More
I need more performance for int8 vector multiplication (Intel AVX-512)...


performancesimdavxavx2avx512

Read More
Enabling AVX512 support on compilation significantly decreases performance...


linuxperformancegccx86-64avx512

Read More
AVX512 assembly breaks when called concurrently from different goroutines...


goassemblyavxavx512

Read More
How to understand this AVX addition of two _m256i variables?...


c++vectoravxavx2avx512

Read More
Emulate AVX512 VPCOMPRESSB byte packing without AVX512_VBMI2...


x86-64simdavxavx512

Read More
Multiply vectors of 32 bit integers, taking only high 32 bits...


c++intrinsicslow-levelavx512

Read More
What is the alternative method for Avx2.MoveMask in Vector512<T>...


c#simdavx512

Read More
extract non-zero elements from __m512i/__m256i vector...


simdintrinsicsavx2avx512

Read More
Relation between Avx512_fp16 and Avx512bw (on non-Intel machines)...


x86avx512

Read More
Setting AVX512 vector to zero/non-zero sometimes causes signal SIGILL on Godbolt...


c++intelavx512compiler-explorergodbolt

Read More
AVX 512 intrinsics to add 512 bits of 128 bit elements...


optimizationx86intelsimdavx512

Read More
How to perform parallel addition using AVX with carry (overflow) fed back into the same element (PE ...


csimdavxavx2avx512

Read More
Determine number of AVX-512 FMA units...


c++avx512

Read More
how can I optimize this simple multi-valued simd splat/broadcast?...


rustavx512

Read More
AVX-512 BF16: load bf16 values directly instead of converting from fp32...


cintrinsicsavx512half-precision-float

Read More
Problem with AVX-512 code optimization (NASM)...


assemblyx86cpu-registersavx512

Read More
AVX512 perform AND of 512bits of 8-bit chars...


c++x86bitwise-operatorsintrinsicsavx512

Read More
Optimal instruction sequence for AVX512 gather of 4D vectors...


c++vectorizationintelsimdavx512

Read More
bitwise shift in AVX512...


c++optimizationintrinsicsavxavx512

Read More
`vmovdqu8` / 16 / 32 / 64 instructions and `_mm_loadu_epi8` / 16 / 32 / 64 intrinsics purpose...


x86intrinsicsavx512

Read More
How to load uint8_t "as" 32 bits integer efficiently into a SIMD register?...


c++simdavx512

Read More
Packed bit test for __m512...


x86-64intrinsicsavx512

Read More
How to call _mm256_mul_ph from rust?...


rustintrinsicsavx512half-precision-float

Read More
simd find first element greater than x...


c++simdavx512

Read More
Is there any performance difference between AVX-512 `_mm512_load_epi64` and `_mm512_loadu_epi64`?...


x86-64intelsimdamd-processoravx512

Read More
Getting Illegal Instruction while running a basic Avx512 code...


c++x86avxinstruction-setavx512

Read More
AVX512 auto-vectorized C++ matrix-vector functions are much slower when source = destination, in-pla...


c++assemblyx86-64avx512auto-vectorization

Read More
How to convert a binary integer number to a hex string?...


assemblyx86hexsimdavx512

Read More
BackNext