intrinsics Examples and Free Source Code

Find position of the unique set bit in 32-bit number...

c++assembly x86 bit-manipulation intrinsics

SSE intrinsics atan2...

c++trigonometry simd sse intrinsics

AVX512-FP16 intrinsics fails in release mode, works in debug...

visual-studio intrinsics avx512

SIMD _mm_store_si128 | _mm_storeu_si128 don't storing correctly...

c++simd intrinsics instruction-set

Seg fault while using _mm256_i64gather_pd...

c++intrinsics avx avx2

Difference between _mm_storeu_si128 and _mm_loadu_si128...

c sse intrinsics

Is it safe to compile one source with SSE2 another with AVX architecture?...

visual-c++sse intrinsics avx

Shuffling a vector by number of bytes...

c++x86 sse intrinsics avx

Transpose 4x4 int32 matrix using NEON...

assembly arm intrinsics neon

Extract the low bit of each bool byte in a __m128i? bool array to packed bitmap...

c++gcc sse intrinsics

How to compile program with _mm_clflushopt function? error: inlining failed...

c gcc compilation intrinsics

How to implement an efficient _mm256_madd_epi8 dot-products of groups of four i8 elements?...

c++x86 simd intrinsics avx2

Accumulating vector in __m128 using _mm_hadd_ps producing compile time error...

c intrinsics

Using Horizontal Neon intrinsics efficiently...

assembly inline-assembly arm64 intrinsics neon

How to convert 32-bit float to 8-bit signed char? (4:1 packing of int32 to int8 __m256i)...

c x86 simd intrinsics avx2

use c's `nmmintrin.h` in zig...

c intrinsics zig

using !Ref in second argument in SAM template...

amazon-web-services yaml aws-cloudformation intrinsics sam

Efficiently extract single double element from AVX-512 vector...

simd intrinsics avx512

Fastest way to implement _mm256_mullo_epi4 using AVX2...

c x86-64 intrinsics avx avx2

How to multiply-accumulate unsigned bytes into 32-bit elements without overflow with RISC-V extensio...

c vectorization simd intrinsics riscv

Usage of __AVX512F__ in Visual Studio for compiling code...

c++visual-studio visual-c++intrinsics avx512

Are there macros for SIMD instruction sets?...

c#simd intrinsics

Counter-intuitive results while playing with intrinsics...

c++simd intrinsics avx2 microbenchmark

Testing for builtins/intrinsics...

c gcc intrinsics

Adding 3D vectors using SIMD intrinsics...

c++vectorization simd intrinsics avx2

Why do compilers not coerce "n / 2.0" into "n * 0.5" if it's faster?...

c++c compiler-optimization intrinsics

How to calculate 2x2 matrix multiplied by 2D vector using SSE intrinsics (32 bit floating points)? (...

c++optimization matrix-multiplication sse intrinsics

Is there a list of all compiler intrinsic function for Delphi by version?...

delphi intrinsics

Extracting edges of AVX2 16x16 bitmatrix...

c bit-manipulation intrinsics avx2

"Intrinsics" possible on GPU on OpenGL?...

performance opengl gpu glsl intrinsics