Search code examples
assemblyarmcpu-architectureendianness

ARM endianness and byte ordering for .ascii vs .word


I just started learning ARM assembly. I am currently on a 32-bit Raspian with "GNU assembler version 2.35.2 (arm-linux-gnueabihf)".

This is my simple program to load part of ascii into a register :

.global _start
_start:
    ldr r1,=helloworld
    ldr r2,[r1]

    @prepare to exit
    mov r0,#0
    mov r7,#1
    svc 0

.data
helloworld:
    .ascii "HelloWorld"

I loaded it into gdb and can see that my register r2 loads 0x6c6c6548 (in ascii "lleH"). A quick objdump shows :

Contents of section .data:
 0000 48656c6c 6f576f72 6c64               HelloWorld

I have below questions :

  1. How does the string look like in memory? In other words, when the endianness come into picture? Will reversal happen while loading into memory? Or the string will be loaded as is into memory but gets reversed while loading into register?
  2. Why the content of register r2 for below program with .word is 0x12345678 instead of 0x78563412 ? Why there is no endianess followed?

Note : .word used instead of .ascii

.global _start
_start:
    ldr r1,=helloworld
    ldr r2,[r1]
    mov r0,#0
    mov r7,#1
    svc 0

.data
helloworld:
    .word 0x12345678

EDIT

The memory dump for first program shows that even the memory has string in same order as in the source code and the object file :

>>> x/32xb 0x1008c
0x1008c:    0x48    0x65    0x6c    0x6c    0x6f    0x57    0x6f    0x72
0x10094:    0x6c    0x64    0x41    0x11    0x00    0x00    0x00    0x61

This indicates that the ldr instruction is converting that memory read into little endian format where LSB holds the first byte in memory. Is the understanding correct? But this still does not answer why this did not happen for a .word.


Solution

  • Endianess or byte order is the order in which the bytes comprising a number are represented in memory.

    A string is an array of bytes. Each byte of this string is subject to endianess, but for a single byte, little and big endian come out to the same thing.

    For your second question: endianess only affects data while being stored in memory. The assembler gives you a human readable representation of the computer program. The token 0x12345678 represents a certain number. When transferred to memory, this token will be written to memory in the appropriate byte order. The assembler takes care of this.

    You will also see the register content as 0x12345678 when watching the execution of your program in a debugger. This is because registers are not part of memory and are not divided into bytes. Each register holds a 32 bit number. The CPU transfers data between registers and memory in the configured byte order (see the SETEND instruction) And without the register being divided into bytes, there is no meaningful way to assign a byte order to it. The debugger can only show you its numeric value. And this just comes out to be the value you assigned to it in your program. Crazy how this works, eh?