Search code examples
assemblymipsendiannessmars-simulator

MARS MIPS Simulator ASCII string not storing in memory in little-endian properly?


I have heard that the MARS MIPS Simulator is little-endian so I expect that if I reserve the string "HELLO" using .asciiz "HELLO" in the memory, I would find the character O to be in the lowest memory address and H in the highest memory address of the given string.

But when I assembled the code, MARS's debugger shows memory like this:

this

H is stored in the 0x10010000 address (the data segment base address) and O is stored in the 0x10010004 -- clearly stored in a higher memory address. Isn't this big-endian?

But, I've noticed that when I reserve word-size data like 0x0000ABCD using .word 0xABCD, D would be placed in the lowest memory as what a little-endian system must do. Why do they store data differently?


Solution

  • Strings are a sequence of bytes, not a single huge integer. The first byte of a string is always the lowest address, regardless of machine endianness.

    The machine endianness only determines what value you'd get in a register if you did lw $t0, my_string.

    But if you loop over the bytes of the string with lbu $a0, ($t1) / addiu $t1, $t1, 1, you definitely want to get the ASCII bytes in the order you wrote them in the source: H, E, L, L, O, 0.

    If you want to store your string backwards, use .asciiz "OLLEH".


    Byte-streams don't have endinness, only things the CPU can load with a single access. The whole concept of endianness comes from being able to access the individual bytes of a word, e.g. sw then lbu.

    If you could only use lw/sw then hardware endianness wouldn't be a thing, it would be up to software how to shift / OR or AND to access bits within a 32-bit integer if it wanted to pack 8-bit ASCII characters. Or if you could only even use lbu / sb, it would be up to software what order to store the separate bytes of a longer integer.

    For strings, everyone makes the sensible choice to store them in printing order, with the first byte at the lowest address. This happens to match the order you want them in for a text file, or for video RAM which scans left-to-right within a line.

    So again, endianness only matters for strings when you're implementing an efficient strlen that checks 4 bytes at a time for 0, using a bithack or something: https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord.

    Although that bithack doesn't actually tell you where that difference is, so maybe a better example would be if you were implementing strcmp by comparing 4 bytes (1 word) at a time. And on mismatch, you could either loop 1 byte at a time within those words to find the exact character that was different, or you could XOR the two words together, then find the position of the lowest set bit (little-endian) or highest set bit (big-endian) to find out which byte contained that first bit-difference. (I don't know if MIPS has clz / ctz count leading / trailing zeros instructions, but if it did you could use them this way.)