Search code examples
extract non-zero elements from __m512i/__m256i vector...


simdintrinsicsavx2avx512

Read More
Relation between Avx512_fp16 and Avx512bw (on non-Intel machines)...


x86avx512

Read More
Setting AVX512 vector to zero/non-zero sometimes causes signal SIGILL on Godbolt...


c++intelavx512compiler-explorergodbolt

Read More
AVX 512 intrinsics to add 512 bits of 128 bit elements...


optimizationx86intelsimdavx512

Read More
How to perform parallel addition using AVX with carry (overflow) fed back into the same element (PE ...


csimdavxavx2avx512

Read More
Determine number of AVX-512 FMA units...


c++avx512

Read More
how can I optimize this simple multi-valued simd splat/broadcast?...


rustavx512

Read More
AVX-512 BF16: load bf16 values directly instead of converting from fp32...


cintrinsicsavx512half-precision-float

Read More
Problem with AVX-512 code optimization (NASM)...


assemblyx86cpu-registersavx512

Read More
AVX512 perform AND of 512bits of 8-bit chars...


c++x86bitwise-operatorsintrinsicsavx512

Read More
Optimal instruction sequence for AVX512 gather of 4D vectors...


c++vectorizationintelsimdavx512

Read More
bitwise shift in AVX512...


c++optimizationintrinsicsavxavx512

Read More
`vmovdqu8` / 16 / 32 / 64 instructions and `_mm_loadu_epi8` / 16 / 32 / 64 intrinsics purpose...


x86intrinsicsavx512

Read More
How to load uint8_t "as" 32 bits integer efficiently into a SIMD register?...


c++simdavx512

Read More
Packed bit test for __m512...


x86-64intrinsicsavx512

Read More
How to call _mm256_mul_ph from rust?...


rustintrinsicsavx512half-precision-float

Read More
simd find first element greater than x...


c++simdavx512

Read More
Is there any performance difference between AVX-512 `_mm512_load_epi64` and `_mm512_loadu_epi64`?...


x86-64intelsimdamd-processoravx512

Read More
Getting Illegal Instruction while running a basic Avx512 code...


c++x86avxinstruction-setavx512

Read More
AVX512 auto-vectorized C++ matrix-vector functions are much slower when source = destination, in-pla...


c++assemblyx86-64avx512auto-vectorization

Read More
How to convert a binary integer number to a hex string?...


assemblyx86hexsimdavx512

Read More
What is the difference between "mask_mov" and "mask_blend" when using intrinsics...


intrinsicsavx512

Read More
Collapse __mask64 aka 64-bit integer value, counting nibbles that have all bits set?...


c++bit-manipulationavxavx512

Read More
Performance Difference Between _mm512_load_si512 and _mm512_stream_load_si512...


simdavx512

Read More
.NET8 supports Vector512, but why doesn't Vector reach 512 bits?...


c#simdintrinsicsavx512.net-8.0

Read More
SIMD algorithm to check of if an integer block is "consecutive."...


rustsimdavxavx512

Read More
Unable to get correct rounding mode code for `vrndscalepd`...


assemblyfloating-pointx86-64nasmavx512

Read More
Why adding vmovapd instruction makes simd vectorized code run faster?...


assemblysimdmicrobenchmarkavx512

Read More
What are the AVX-512 Galois-field-related instructions for?...


avx512galois-field

Read More
x86-64 SIMD mechanism to "compare" 8-bit unsigned integers, giving a vector of +1 / 0 / -1...


simdavxavx2avx512

Read More
BackNext