performance assembly x86-16 cpu-registers micro-optimization

Most Efficient way to set Register to 1 or (-1) on original 8086

I am taking an assembly course now, and the guy who checks our home assignments is a very pedantic old-school optimization freak. For example he deducts 10% if he sees:

mov ax, 0

instead of:

xor ax,ax

even if it's only used once.

I am not a complete beginner in assembly programing but I'm not an optimization expert, so I need your help in something (might be a very stupid question but I'll ask anyway): if I need to set a register value to 1 or (-1) is it better to use:

mov ax, 1

or do something like:

xor ax,ax
inc ax

I really need a good grade, so I'm trying to get it as optimized as possible. ( I need to optimize both time and code size)

Solution

A quick google for 8086 instructions timings size turned up a listing of instruction timings which seems to have all the timings and sizes for the 8086/8088 through Pentium.

Although you should note that this probably doesn't include code fetch memory bottlenecks which can be very significant, especially on an 8088. This usually makes optimization for code-size a better choice. See here for some details on this.

No doubt you could find official Intel documentation on the web with similar information, such as the "8086/8088 User's Manual: Programmer's and Hardware Reference".

For your specific question, the table below gives a comparison that indicates the latter is better (less cycles, and same space):

Instructions	Clock cycles	Bytes
xor ax, ax inc ax	3 3 --- 6	2 1 --- 3
mov ax, 1	4	3

But you might want to talk to your educational institute about this guy. A 10% penalty for a simple thing like that seems quite harsh. You should ask what should be done in the case where you have two possibilities, one faster and one shorter.

Then, once they've admitted that there are different ways to optimise code depending on what you're trying to achieve, tell them that what you're trying to do is optimise for readability and maintainability, and seriously couldn't give a damn about a wasted cycle or byte here or there⁽¹⁾.

Optimisation is something you generally do if and when you have a performance problem, after a piece of code is in a near-complete state - it's almost always wasted effort when the code is still subject to a not-insignificant likelihood of change.

For what it's worth, sub ax,ax appears to be on par with xor ax,ax in terms of clock cycles and size, so maybe you could throw that into the mix next time to cause him some more work.

_{⁽¹⁾No, don't really do that , but it's fun to vent occasionally :-)}