Search code examples
how can I optimize this simple multi-valued simd splat/broadcast?...


rustavx512

Read More
AVX-512 BF16: load bf16 values directly instead of converting from fp32...


cintrinsicsavx512half-precision-float

Read More
Problem with AVX-512 code optimization (NASM)...


assemblyx86cpu-registersavx512

Read More
AVX512 perform AND of 512bits of 8-bit chars...


c++x86bitwise-operatorsintrinsicsavx512

Read More
Optimal instruction sequence for AVX512 gather of 4D vectors...


c++vectorizationintelsimdavx512

Read More
bitwise shift in AVX512...


c++optimizationintrinsicsavxavx512

Read More
`vmovdqu8` / 16 / 32 / 64 instructions and `_mm_loadu_epi8` / 16 / 32 / 64 intrinsics purpose...


x86intrinsicsavx512

Read More
How to load uint8_t "as" 32 bits integer efficiently into a SIMD register?...


c++simdavx512

Read More
Packed bit test for __m512...


x86-64intrinsicsavx512

Read More
How to call _mm256_mul_ph from rust?...


rustintrinsicsavx512half-precision-float

Read More
simd find first element greater than x...


c++simdavx512

Read More
Is there any performance difference between AVX-512 `_mm512_load_epi64` and `_mm512_loadu_epi64`?...


x86-64intelsimdamd-processoravx512

Read More
Getting Illegal Instruction while running a basic Avx512 code...


c++x86avxinstruction-setavx512

Read More
AVX512 auto-vectorized C++ matrix-vector functions are much slower when source = destination, in-pla...


c++assemblyx86-64avx512auto-vectorization

Read More
How to convert a binary integer number to a hex string?...


assemblyx86hexsimdavx512

Read More
What is the difference between "mask_mov" and "mask_blend" when using intrinsics...


intrinsicsavx512

Read More
Collapse __mask64 aka 64-bit integer value, counting nibbles that have all bits set?...


c++bit-manipulationavxavx512

Read More
Performance Difference Between _mm512_load_si512 and _mm512_stream_load_si512...


simdavx512

Read More
.NET8 supports Vector512, but why doesn't Vector reach 512 bits?...


c#simdintrinsicsavx512.net-8.0

Read More
SIMD algorithm to check of if an integer block is "consecutive."...


rustsimdavxavx512

Read More
Unable to get correct rounding mode code for `vrndscalepd`...


assemblyfloating-pointx86-64nasmavx512

Read More
Why adding vmovapd instruction makes simd vectorized code run faster?...


assemblysimdmicrobenchmarkavx512

Read More
What are the AVX-512 Galois-field-related instructions for?...


avx512galois-field

Read More
x86-64 SIMD mechanism to "compare" 8-bit unsigned integers, giving a vector of +1 / 0 / -1...


simdavxavx2avx512

Read More
Am I missing a target-feature for AVX512 when I compile my Rust code?...


rustsimdrust-cargoavx2avx512

Read More
AVX512-FP16 intrinsics fails in release mode, works in debug...


visual-studiointrinsicsavx512

Read More
Xcode Apple Clang enable avx512...


xcodeclangavxavx2avx512

Read More
why does gcc auto-vectorization for tigerlake use ymm not zmm registers...


cgccavxavx512auto-vectorization

Read More
Filling an AVX512 register with incrementing bytes...


assemblyoptimizationx86-64micro-optimizationavx512

Read More
AV512: Best way to combine horizontal sum and broadcast...


cintelavxavx512

Read More
BackNext