Search code examples
Generate FMOV without inline assembly...


clangsimdarm64micro-optimizationsve

Read More
Branch on ?: operator?...


c++hardwaremicro-optimizationbranchless

Read More
Micro Optimization of a 4-bucket histogram of a large array or list...


c#optimizationhistogramsimdmicro-optimization

Read More
INC instruction vs ADD 1: Does it matter?...


performanceassemblyx86incrementmicro-optimization

Read More
boost::thread data structure sizes on the ridiculous side?...


c++boost-asioboost-threadmicro-optimizationsystems-programming

Read More
Preserving the Execution pipeline with branch layout in C source? Which prediction do CPUs or compil...


ccpu-architecturecompiler-optimizationmicro-optimizationbranch-prediction

Read More
Why is this reordering of sub and mul instructions helpful?...


cassemblygcccpu-architecturemicro-optimization

Read More
Cost of exception handlers in Python...


pythonperformanceexceptionmicro-optimization

Read More
Is it "too clever" for using LEA to load constant to register?...


assemblyx86-64nasmmicro-optimization

Read More
Use two loop bodies or one (result identical)?...


coptimizationcachingcpumicro-optimization

Read More
Using the operand-size override prefix 0x66 for instruction alignment...


assemblyx86-64masmmemory-alignmentmicro-optimization

Read More
Extract fractional part of double *efficiently* in C...


cfloating-pointdoublemicro-optimizationbit-manipulation

Read More
Why is my operator ++ more than twice as fast as its equivalent instance method?...


c#.net-coremicro-optimizationbenchmarkdotnet

Read More
Is performance reduced when executing loops whose uop count is not a multiple of processor width?...


performanceassemblyx86cpu-architecturemicro-optimization

Read More
what is the purpose of using index caches in rigtorp's SPSCQueue...


queuecpu-architecturecpu-cachemicro-optimizationlock-free

Read More
Branchless count-leading-zeros on 32-bit RISC-V without Zbb extension...


algorithmbit-manipulationriscvmicro-optimizationriscv32

Read More
Is it still worth using the Quake fast inverse square root algorithm nowadays on x86-64?...


algorithmoptimizationx86-64micro-optimizationsqrt

Read More
Fast BCD addition...


calgorithmbit-manipulationmicro-optimizationbcd

Read More
What is the most optimal way to use a C# struct as the key of a dictionary?...


c#.netoptimizationmicro-optimization

Read More
Very fast approximate Logarithm (natural log) function in C++?...


c++mathlogarithmmicro-optimizationsqrt

Read More
Is there any data on the latency of an AVX2 gather instruction?...


performancex86latencymicro-optimizationavx2

Read More
Why is `if x is None: pass` faster than `x is None` alone?...


pythonperformancecpythonmicro-optimizationpython-internals

Read More
Optimized 53->32 bit modulo computation on 32-bit processors...


calgorithmmicro-optimizationinteger-division

Read More
Is using AVX2 can implement a faster processing of LZCNT on a word array?...


x86simdavxmicro-optimizationavx2

Read More
Is it possible to check if 2 sets of 3 ints have at least one element in common with less than 9 com...


cperformancebit-manipulationmicro-optimization

Read More
what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256...


x86simdintrinsicsavxmicro-optimization

Read More
Why doesn't the C++ standard library utilize likely/unlikely attributes?...


c++algorithmcompiler-optimizationmicro-optimizationlikely-unlikely

Read More
Test whether a register is zero with CMP reg,0 vs OR reg,reg?...


assemblyoptimizationx86micro-optimization

Read More
Divide by 10 using bit shifts?...


mathbitmicro-optimizationlow-levelinteger-division

Read More
How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false depend...


assemblyx86intelcpu-architecturemicro-optimization

Read More
BackNext