Search code examples
assemblybyteirvine32dword

Assembly x86 Irvine WriteString Byte vs Dword


I've just noticed something interesting, and I am trying to get a better understanding. I tried to use the Irvine WriteString call.

Writestring

Write a null-terminated string.
Input: DX points to the strings offset

INCLUDE Irvine32.inc

.data
    fizz        BYTE    "Fizz",  0

...

MOV EDX, OFFSET fizz
CALL WriteString
CALL CrLf

And works perfectly I can see "Fizz" in my window, just like in the documentation.

However, if I try to use DWORD instead of BYTE

fizz        DWORD   "Fizz",  0

I'm going to see "zziF" in the prompt window. As far as I know, the only difference is between the BYTE and DWORD, the size in bits(8 bit vs 32 bit). I really don't understand the reverse order. What's happening?

I appreciate every answer!


Solution

  • Your question has absolutely nothing to do with "Irvine" or "WriteString".

    The short answer is that what you are seeing is a direct result of the fact that you are programming on an Intel x86 architecture, which is little endian, and you are using an assembler which was written by some old hacker.

    The long answer follows.

    If you are dealing with assembly language then you should learn everything there is to know about endianness, by looking it up on Wikipedia or someplace else, but in a nutshell it defines how bytes are stored in successive memory locations to form larger-than-byte quantities. There are two types of endianness:

    • big endian, and
    • little endian.

    Little endian means that on larger-than-byte quantities, the least significant ("low") byte is stored first, followed by bytes of successively higher significance. Consequently, in a DWORD, the least significant ("low") word is stored first, and the most significant ("high") word follows. This is the opposite of big endian, where the high byte is stored first. You might think that big endian is more intuitive, because it more closely matches how we humans represent numbers, with the most significant digit being the leftmost digit, and less significant digits following, but that's just us humans, and it is entirely arbitrary; there is nothing about the nature of numbers that dictates that significance of digits should be ordered from left to right or from right to left. On the contrary, there are certain hardware benefits in little endian, and that's what Intel chose for the x86 architecture.

    But I digress.

    So, here is what is happening:

    The assembler that you are using is trying to be smart, and to allow you to specify the value of a DWORD using a string literal. This is nonsensical, because a DWORD is supposed to contain a 32-bit number, not a string, but they are trying to accommodate dirty hacks. It is also entirely arbitrary, because there are many ways one could imagine that such a quirk could be implemented, and they have just picked one way, I suppose the one they fancied most.

    So, apparently, what they do is they take your string literal, and they regard it as a group of four characters forming a DWORD. And of course when they store the DWORD in memory, they store it in little-endian, as is appropriate for the Intel architecture, which means that you get "zziF" instead of "Fizz". It is important to understand that this "zziF" is baked into your program by the assembler, and that the "WriteString" function prints it as it sees it. The same would be printed if you were using any other function that prints strings. It is not the function's fault.