Search code examples
What do multiple values or ranges means as the latency for a single instruction?...

performanceassemblyx86cpu-architecturemicro-optimization

Read More
Why use push/pop instead of sub and mov?...

assemblyx86x86-64cpu-architecturemicro-optimization

Read More
Fastest way to initialize a __m128i constant with intrinsics?...

cvisual-c++sseintrinsicsmicro-optimization

Read More
How to copy a register and do `x*4 + constant` with the minimum number of instructions...

assemblyx86micro-optimization

Read More
latency for 'pcmpeqb' - memory vs xmm register...

assemblyoptimizationssemicro-optimizationsse2

Read More
Difference between n = 0 and n = n - n...

cassemblyoptimizationcompiler-constructionmicro-optimization

Read More
Allocating memory aligned buffers for SIMD; how does |16 give an odd multiple of 16, and why do it?...

c++dynamic-memory-allocationsimdmemory-alignmentmicro-optimization

Read More
How can I rearrange MIPS code to minimise the number of NOPs needed, by hand?...

assemblyoptimizationmipspipelinemicro-optimization

Read More
array_push() vs. $array[] = .... Which is fastest?...

phpmysqlmicro-optimization

Read More
Micro optimization: Returning from an inner block at the end of a function...

javascriptmicro-optimization

Read More
Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Addin...

assemblyx86micro-optimizationmicrobenchmarkmicro-architecture

Read More
Can compilers ever optimize variables to use less than a byte of space?...

algorithmperformanceoptimizationbit-manipulationmicro-optimization

Read More
Loop optimization. How does register renaming break dependencies? What is execution port capacity?...

performanceoptimizationx86cpu-architecturemicro-optimization

Read More
Fast Way of Indexing Operator in Python (lambda i: l[i])...

pythonmicro-optimization

Read More
The advantages of using 32bit registers/instructions in x86-64...

gccassemblyx86-64micro-optimization

Read More
Transpile forEach, map, filter and for of to length based for loop in JavaScript...

javascriptwebpackmicro-optimization

Read More
Why are bitwise operators slower than multiplication/division/modulo?...

pythonoptimizationbitwise-operatorsmicro-optimization

Read More
Is it useful to check if a Java collection is empty before beginning iteration?...

javacollectionsgarbage-collectioniterationmicro-optimization

Read More
Address-size override prefix in 64-bit or using 64-bit registers...

assemblyx86-64micro-optimizationaddressing-mode

Read More
Assembly Jump with Multiple plus or do plus before jump (performance)...

performanceassemblyx86micro-optimization

Read More
Performance of assembly function with multiple RET...

performanceassemblyx86x86-64micro-optimization

Read More
Is there a penalty in having a non-aligned Jcc which is nearly never taken in Intel/AMD 64?...

loopsbranchx86-64memory-alignmentmicro-optimization

Read More
Fast method for testing a bit of a large int...

pythonpython-3.xoptimizationmicro-optimizationlargenumber

Read More
Is CMOVcc considered a branching instruction?...

assemblyx86-64cpu-architecturemicro-optimizationbranch-prediction

Read More
How can I resolve data dependency in pointer arrays?...

c++performancecompiler-optimizationmicro-optimization

Read More
Does Skylake need vzeroupper for turbo clocks to recover after a 512-bit instruction that only reads...

assemblyx86intelmicro-optimizationavx512

Read More
Why swap doesn't use Xor operation in C++...

c++swapxormicro-optimizationpremature-optimization

Read More
Ways to make a D program faster...

optimizationdmicro-optimizationdmdldc

Read More
Why does .NET Native compile loop in reverse order?...

c#assemblyx86micro-optimization.net-native

Read More
Micro-optimizing a linear search loop over a huge array with OpenMP: can't break on a hit...

cloopspthreadsopenmpmicro-optimization

Read More
BackNext