Search code examples
Is performance reduced when executing loops whose uop count is not a multiple of processor width?...


performanceassemblyx86cpu-architecturemicro-optimization

Read More
what is the purpose of using index caches in rigtorp's SPSCQueue...


queuecpu-architecturecpu-cachemicro-optimizationlock-free

Read More
Branchless count-leading-zeros on 32-bit RISC-V without Zbb extension...


algorithmbit-manipulationriscvmicro-optimizationriscv32

Read More
Is it still worth using the Quake fast inverse square root algorithm nowadays on x86-64?...


algorithmoptimizationx86-64micro-optimizationsqrt

Read More
Fast BCD addition...


calgorithmbit-manipulationmicro-optimizationbcd

Read More
What is the most optimal way to use a C# struct as the key of a dictionary?...


c#.netoptimizationmicro-optimization

Read More
Very fast approximate Logarithm (natural log) function in C++?...


c++mathlogarithmmicro-optimizationsqrt

Read More
Is there any data on the latency of an AVX2 gather instruction?...


performancex86latencymicro-optimizationavx2

Read More
Why is `if x is None: pass` faster than `x is None` alone?...


pythonperformancecpythonmicro-optimizationpython-internals

Read More
Optimized 53->32 bit modulo computation on 32-bit processors...


calgorithmmicro-optimizationinteger-division

Read More
INC instruction vs ADD 1: Does it matter?...


performanceassemblyx86incrementmicro-optimization

Read More
Is using AVX2 can implement a faster processing of LZCNT on a word array?...


x86simdavxmicro-optimizationavx2

Read More
Is it possible to check if 2 sets of 3 ints have at least one element in common with less than 9 com...


cperformancebit-manipulationmicro-optimization

Read More
what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256...


x86simdintrinsicsavxmicro-optimization

Read More
Why doesn't the C++ standard library utilize likely/unlikely attributes?...


c++algorithmcompiler-optimizationmicro-optimizationlikely-unlikely

Read More
Test whether a register is zero with CMP reg,0 vs OR reg,reg?...


assemblyoptimizationx86micro-optimization

Read More
Divide by 10 using bit shifts?...


mathbitmicro-optimizationlow-levelinteger-division

Read More
How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false depend...


assemblyx86intelcpu-architecturemicro-optimization

Read More
Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unroll...


cassemblyx86ssemicro-optimization

Read More
Converting nucleobase representation from ASCII to UCSC .2bit...


calgorithmbit-manipulationbioinformaticsmicro-optimization

Read More
Can packing variables or parameters into structures/unions introduce unforseen performance penalties...


cgccstructmicro-optimizationswar

Read More
Floating point division vs floating point multiplication...


c++floating-pointmicro-optimization

Read More
Controlling class member layout AND destructor order...


c++classconstructordestructormicro-optimization

Read More
JavaScript: Is the `if / else` statement faster than the conditional statement in?...


javascriptperformanceif-statementconditional-statementsmicro-optimization

Read More
Do most compilers optimize MATMUL(TRANSPOSE(A),B)?...


fortrangfortranintel-fortranmicro-optimization

Read More
Is x >= 0 more efficient than x > -1?...


c++optimizationmicro-optimization

Read More
Fastest way to find 16bit match in a 4 element short array?...


cx86-64micro-optimizationswar

Read More
Why XOR before SETcc?...


c++assemblyx86micro-optimization

Read More
In assembly, should branchless code use complementary CMOVs?...


assemblyx86micro-optimizationbranchlessconditional-move

Read More
Fast sign of integer in C...


cmicro-optimizationsigned-integer

Read More
BackNext