Search code examples
mipscpucpu-architectureendianness

Endianness when it comes to widening/narrowing data


I've been reading about endianess again, after a few months with MIPS. I'm a little confused when it comes to when it matters when loading/storing from/to memory, so can someone verify If my understanding is correct? I don't have a Big Endian machine to test it and for some reason can't get qemu to work.

Example 1:

    lw $t0,word_ 
    word_: .word 0xAABBCCDD // behaves the same in both Endian and Little Endian

Example 2:
    
    lw $t0,bytearr_
    bytearr_ : .byte 0xAB, 0xCD, 0xEF, 0xAA // either 0xABCDEFAA on BE or 0xAAEFCDAB on LE (?)

Example 3:
    
    lhw $t0,b2hw
    b2hw : .byte 0xAB, 0xCD //can this lead to issues as well? (LE is 0xCDAB, BE is 0xABCD)

Please correct me if I'm wrong or if I missing any potential conversion that could go wrong from one endianness to the other. Thanks!

EDIT: What is going to happen in the case of LE/BE if I attempt to load a word into a halfword or a halfword into a word? For instance lw $t0, hw_ where hw_: .half 0xABCD and lhw $t0, w_ where w_: .word 0xAABBCCDD


Solution

  • In Example 1 the assembler is taking care of how to arrange the word in memory.

    If the architecture is little endian the lowest address will hold 0xDD, then 0xCC, then 0xBB and then 0xAA.

    If the architecture is big endian it will be the other way round: first 0xAA, then 0xBB, then 0xCC and then 0xDD. So when you issue lw $t0, word_ you get the value you expect (0xAABBCCDD).

    On your second example you are defining an array of bytes, so the assembler must obey your ordering. The lowest address will hold 0xAB, then 0xCD, then 0xEF, then 0xAA.

    So when you issue lw $t0,bytearr_ you will get different results whether your architecture is little endian or big endian.

    If your architecture is little endian you end up with $t0=0xAAEFCDAB and if your architecture is big endian you end up with $t0=0xABCDEFAA.

    The third example is similar to the second. You define an array of bytes, so the lowest address will hold 0xAB and then 0xCD and issuing lhw $t0, b2hw will end up with $t0=0xCDAB if the architecture is little endian and $t0=0xABCD if it is big endian.

    If you wish to let the assembler manage the arrangement then you would use the directive .half, like so:

    lhw $t0,b2hw
    b2hw : .half 0xABCD //let the assembler figure out how to arrange this half word in memory
    

    Your final question about what happens when you "attempt to load a word into a halfword or a halfword into a word?".

    The answer is that you really don't load a word into a halfword nor a halfword into a word. You load a word or a halfword starting at some address. So if you have the following example:

    hw_: .half 0xABCD
    w_: .word 0xAABBCCDD
    

    this code:

    lw $t0, hw_
    

    will load a word starting at the address pointed by hw_, and

    lhw $t0, w_
    

    will load a half word starting at the address pointed by h_. The arrangement in memory would be (from smaller addresses to larger ones):

    if its little endian:

    0xCD  ; hw_
    0xAB
    0xDD  ; w_
    0xCC
    0xBB
    0xAA 
    

    so if you issue lw $t0, hw_ you would get 0xCCDDABCD, and with lhw $t0, w_ you would get 0xCCDD.

    And if it was big endian:

    0xAB  ; hw_
    0xCD  
    0xAA  ; w_
    0xBB
    0xCC
    0xDD  
    

    so if you issue lw $t0, hw_ you would get 0xABCDAABB, and with lhw $t0, w_ you would get 0xAABB.