Search code examples
assemblyx86asciimasmemu8086

Assembly language numbers: is MOV AX,1 ASCII or integer


What is difference between

num db 1
mov ax,1

And mov ax, num

Is it ASCII or integer when mov ax,1 is executed?

I mean, the number is not inserted from keyboard, it is predefined, may it ask to convert it to integer? Is it ASCII?


Solution

  • ASCII is type of encoding, i.e. how you should interpret certain numeric value, like "33 is exclamation"...

    But the CPU doesn't know ASCII, mov al,33 just sets al register to bit pattern 00100001, and when some other code will use that value as ASCII character (like drawing some glyph from provided font graphics to the display), it will emerge as "exclamation mark".

    On the CPU level it's just number 33. mov al,'!' and mov al,33 and mov al,21h will all produce binary identical machine instruction (loading that bit pattern 00100001 into register al), there's no difference for CPU. All difference is just "formatting sugar" of source (making it easier for programmer to understand what was original intent of source, if the number 33 in al is expected to be used as ASCII character (mov al,'!'), or as numeric value in some calculation (mov al,33)).

    To enter (in source) ASCII character value use the single-quote like mov ax,'1', that will assemble as mov ax,49 (in emu8086, MASM, and almost all of the other x86 assemblers, but it's feature of the assembler, you may run into assemblers which do not understand this "character" syntax, and then you will have to use something like mov ax,49 to get the same result).


    mov ax,num will set al (bottom 8 bits of ax) to 1, and ah (upper 8 bits of ax) to some undefined value (you do db 1, but then you read word, so to make sure you load word value 1, you should define two bytes at address num, like num: db 1, 0 or easier to read num: dw 1 (again both variants produce identical machine code for CPU, the difference is only in source code).

    If you would define num: dw 1, then after executing mov ax,1 vs mov ax,num the result is identical, in both cases the ax will contain value 1. But in the first variant the value 1 is encoded inside the instruction itself (B80100 is machine code for mov ax,1 for 8086 CPU). And in the second variant (in Intel syntax it is mov ax,[num], which I strongly recommend to use even with emu8086 or MASM, to make the memory access visible when reading source code) does read two bytes from memory (additionally to the three bytes of the instruction opcode, which was read+decoded before).

    So the mov ax,1 will be probably more performant in most of the scenarios, but mov ax,[num] is more flexible (something can modify the value in memory, then the result will be no more 1, but that new modified value). (it's possible to self-modify also the instruction mov ax,1 to change the value encoded in machine code, but that option is generally frown upon, because it makes the source usually harder to understand, and it has severe performance and security implications on modern x86 machines, so this practice has been abandoned)