Search code examples
Can I use movl instead of movb or movw in Assembly on modern systems?...

assemblyx86cpu-architecturemicro-optimizationmov

Read More
4-way bytewise interleave 4x 16-byte vectors from memory, with AVX512...

x86x86-64micro-optimizationavx512

Read More
Insert a bit in a byte at pos n with Assembly...

assemblyarmmicro-optimizationcortex-m

Read More
Why jnz requires 2 cycles to complete in an inner loop...

x86micro-optimizationmicrobenchmarkmicro-architecture

Read More
Cycles/cost for L1 Cache hit vs. Register on x86?...

performancex86cpu-architecturecpu-cachemicro-optimization

Read More
When to use a certain calling convention...

assemblyx86x86-64calling-conventionmicro-optimization

Read More
Is there a penalty when base+offset is in a different page than the base?...

performanceassemblyx86micro-optimization

Read More
Why does GCC chose dword movl to copy a long shift count to CL?...

assemblygccx86-64micro-optimization

Read More
Is there any performance difference in using int versus int8_t...

ctypesmicro-optimization

Read More
Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock?...

assemblyx86cpu-cachemicro-optimizationcompare-and-swap

Read More
What is the optimal way for reading the contents of a webpage into a string in Java?...

javastringoptimizationinputstreammicro-optimization

Read More
Why this unnecessary MOVAPD copy in gcc 9.1, in a tiny function...

assemblygccx86-64ssemicro-optimization

Read More
In x86-64 asm: is there a way of optimising two adjacent 32-bit stores / writes to memory if the sou...

assemblyoptimizationx86-64micro-optimization

Read More
Instructions to copy the low byte from an int to a char: Simpler to just do a byte load?...

cassemblyx86-64micro-optimizationinstructions

Read More
Can this MIPS assembly code be simplified?...

stringassemblymipsmicro-optimizationsimplify

Read More
Avoiding AVX-SSE (VEX) Transition Penalties...

assemblyx86sseavxmicro-optimization

Read More
Is movzbl followed by testl faster than testb?...

performanceassemblyx86x86-64micro-optimization

Read More
An implementation of std::atomic_thread_fence(std::memory_order_seq_cst) on x86 without extra perfor...

c++x86inline-assemblymicro-optimizationmemory-barriers

Read More
What is the fastest way to swap the bytes of an unaligned 64 bit value in memory?...

performanceassemblyx86-64endiannessmicro-optimization

Read More
Should combining memory fence for mutex acquire-exchange loop (or queue acquire-load loop) be done o...

armcpu-architecturemicro-optimizationmemory-barriers

Read More
How to instruct MS Visual C++ compiler to use an uninitialized __m512i register...

c++visual-c++intrinsicsmicro-optimizationavx512

Read More
Do java finals help the compiler create more efficient bytecode?...

javaoptimizationmicro-optimization

Read More
How can one figure out if a loop is being entered with a 16 byte aligned address in x86-64 assembly?...

assemblyoptimizationx86-64memory-alignmentmicro-optimization

Read More
fastest way to negate a number...

c++visual-c++x86micro-optimizationvisual-c++-2012

Read More
repz ret: why all the hassle?...

assemblyx86micro-optimizationamd-processorbranch-prediction

Read More
What does `rep ret` mean?...

assemblyx86micro-optimizationbranch-prediction

Read More
Is calling `add` on a memory location faster than calling it on a register and then moving the value...

assemblyx86x86-64micro-optimization

Read More
Fast method to copy memory with translation - ARGB to BGR...

cx86rgbssemicro-optimization

Read More
Impact on performance when having multiple returns...

assemblyx86micro-optimization

Read More
80286: Which is the fastest way to multiply by 10?...

assemblyx86-16micro-optimization

Read More
BackNext