Search code examples
Is there another way to calculate (a^3) * b + 5*(b^2) without so many mul instructions?...

assemblyx86micro-optimization

Read More
How do you reason about fluctuations in benchmarking data?...

performanceoptimizationbenchmarkingmicro-optimizationmicrobenchmark

Read More
Latency bounds and throughput bounds for processors for operations that must occur in sequence...

performancecpu-architecturemicro-optimization

Read More
What is the fastest way to find if a number is even or odd?...

cmicro-optimization

Read More
Which is better option to use for dividing an integer number by 2?...

c++coptimizationdivisionmicro-optimization

Read More
Fastest way to set highest order bit of rax register to lowest order bit in rdx register...

assemblyoptimizationbit-manipulationx86-64micro-optimization

Read More
One instruction to clear PF (Parity Flag) -- get odd number of bits in result register...

assemblyx86micro-optimizationparity

Read More
Does using xor reg, reg give advantage over mov reg, 0?...

assemblyx86micro-optimization

Read More
Which sequence of instructions has better performance for zeroing one register or another?...

performanceassemblymipscpu-architecturemicro-optimization

Read More
Inlining assembly in C...

cassemblygccx86-64micro-optimization

Read More
Which is faster for bitwise NOT operation: precalculated table or `~`...

c++cpu-architecturecpu-cachemicro-optimization

Read More
Does optimizing code in TI-BASIC actually make a difference?...

optimizationmicro-optimizationti-basic

Read More
Are there performance/storage differences between uint2 and uint64_t in cuda10+?...

cudagpuhpcmicro-optimization

Read More
Most Efficient way to set Register to 1 or (-1) on original 8086...

performanceassemblyx86-16cpu-registersmicro-optimization

Read More
Copy bit of one register to another register (x86-64 asm)...

assemblyx86bit-manipulationmicro-optimization

Read More
When, if ever, is loop unrolling still useful?...

performanceoptimizationlanguage-agnosticmicro-optimizationloop-unrolling

Read More
Is the python "elif" compiled differently from else: if?...

javapythonc++if-statementmicro-optimization

Read More
Using Intrinsics to Extract And Shift Odd/Even Bits...

c++bit-manipulationintrinsicsmicro-optimization

Read More
Does a Length-Changing Prefix (LCP) incur a stall on a simple x86_64 instruction?...

performanceassemblyx86-64cpu-architecturemicro-optimization

Read More
Why does the short (16-bit) variable mov a value to a register and store that, unlike other widths?...

cassemblyx86cpu-architecturemicro-optimization

Read More
How to speed up my Print all partitions of an n-element set into k unordered sets...

c++performanceoptimizationmicro-optimization

Read More
Go: multiple len() calls vs performance?...

algorithmgomicro-optimization

Read More
x86 Assembly pushad/popad, How fast it is?...

performanceassemblyx86inline-assemblymicro-optimization

Read More
How to force GCC to assume that a floating-point expression is non-negative?...

c++gccassemblyfloating-pointmicro-optimization

Read More
Intel JCC Erratum - what is the effect of prefixes used for mitigation?...

assemblyx86intelcpu-architecturemicro-optimization

Read More
GEMM kernel implemented using AVX2 is faster than AVX2/FMA on a Zen 2 CPU...

assemblymatrix-multiplicationsimdavxmicro-optimization

Read More
string_view Vs const char* performance...

c++c-stringsstdstringmicro-optimizationstring-view

Read More
Do 32-bit and 64-bit registers cause differences in CPU micro architecture?...

assemblyx86-64intelcpu-architecturemicro-optimization

Read More
Predecoders and decoders. Difference...

assemblyx86cpu-architecturemicro-optimization

Read More
When joining four 1-byte vars into one 4-byte word, which is a faster way to shift and OR ? (compari...

cassemblyx86bit-manipulationmicro-optimization

Read More
BackNext