Search code examples
Find largest element in matrix and its column and row indexes using SSE and AVX...

c++matrixsseavxavx2

Read More
Does anyone know of a fix for an MSVC compiler bug/annoyance where SIMD Extension settings get &quot...

c++visual-c++simdavxvector-class-library

Read More
Generate random numbers in a given range with AVX2, faster than SVML _mm256_rem_epu32 remainder?...

c++randomsimdmoduloavx

Read More
Reverse a AVX register containing doubles using a single AVX intrinsic...

cssevectorizationsimdavx

Read More
Are there any real benefits to compiling a 32-bit version of my DLL with AVX or higher?...

simdavxvector-class-library

Read More
How does the _mm256_shuffle_epi8 make sense in this Game of Life implementation?...

c++intrinsicsavxconways-game-of-life

Read More
AVX2: BitScanReverse or CountLeadingZeros on 8 bit elements in AVX register...

c++simdintrinsicsavxavx2

Read More
AVX2: CountTrailingZeros on 8 bit elements in AVX register...

c++simdintrinsicsavxavx2

Read More
Half-precision floating-point arithmetic on Intel chips...

x86intelavxfloating-point-conversionhalf-precision-float

Read More
Does FFTW determine SIMD version dynamically?...

simdsseavxfftwavx2

Read More
SSE-copy, AVX-copy and std::copy performance...

c++performancessesimdavx

Read More
GEMM kernel implemented using AVX2 is faster than AVX2/FMA on a Zen 2 CPU...

assemblymatrix-multiplicationsimdavxmicro-optimization

Read More
What is the purpose of the MoveMask for SSE and AVX...

.net-coref#x86sseavx

Read More
How to load into __m256 from a float* but reading backwards in memory as opposed to forwards?...

c++cx86-64intrinsicsavx

Read More
Can FP compares like SSE2 _mm_cmpeq_pd be used to compare 64 bit integers?...

simdsseavxsse2

Read More
How is the lvalue problem solved for SIMD inline asm with memory output operands in a 2D array?...

c++assemblyinline-assemblyavx

Read More
In assembly, how to add integers without destroying either operand?...

assemblyx86-64avxgnu-assembler

Read More
Understanding C# SIMD output...

c#assemblyx86-64simdavx

Read More
AVX2: Is there a way to implement _mm256_mul_epi8 function for a constant power of 2?...

c++simdintrinsicsavxavx2

Read More
YASM: vmovaps instruction causing segmentation fault...

assemblyx86-64nasmmemory-alignmentavx

Read More
AVX load instruction with increment...

x86vectorizationsimdavx

Read More
Extracting ints and shorts from a struct using AVX?...

c++x86ssesimdavx

Read More
When source registers in avx instruction can be reused...

assemblycpu-architecturesimdavxmicro-optimization

Read More
How to convert int 64 to int 32 with avx (but without avx-512)...

simdsseavx

Read More
int8 x uint8 matrix-vector product with column-major layout...

assemblyx86simdsseavx

Read More
Using AVX CPU instructions: Poor performance without "/arch:AVX"...

c++performancevisual-studio-2010sseavx

Read More
Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or A...

windowsassemblysseavxavx512

Read More
How to interleave 3 float vectors into an array with AVX intrinsics C++...

c++simdintrinsicsavxavx2

Read More
How to enable /arch:AVX for Unreal Engine 4?...

c++visual-studio-2017unreal-engine4avxvisual-studio-2017-build-tools

Read More
Proper use of _mm256_maskload_ps for loading less than 8 floats into __m256...

c++simdavx

Read More
BackNext