I am working on a binary analysis project where I am building a lifter that translates assembly to llvm. I built a memory model but a bit confused on how str and ldr arm assembly instructions work on the memory. So my question is. given a memory address 0000b8f0 for example in which I would like to store a 64 bit decimal value of 20000000. does the str instruction store the entire 20000000 in address 0000b8f0 or does it divide it into bytes and stores first byte in 0000b8f0 and 2nd byte in 0000b8f1 and 3rd byte in 0000b8f2 and so on...and same goes for loading from an address (0000b8f0) does the ldr instruction take just the byte stored at 0000b8f0 or the full set of bytes from 0000b8f0-0000b8f4.
sorry if my question is very basic but I need to make sure I correctly implement the effects of the str and ldr on my memory model.
Logically1, memory is an array of 8-bit bytes.
Word load/stores access more than one byte at once, just like SIMD intrinsics in C, or like the opposite of ((char*)my_int)[2]
to load the 3rd byte of an int
.
C's memory model was designed around a byte-addressable machine that supports wider accesses (like PDP-11 or ARM), so it's what you're used to if you understand how char*
works in C for accessing the object-representation of other objects, e.g. why memcpy
2 works.
(I didn't use a C example of pointing an int*
at a char array because the strict-aliasing rule in C makes that undefined behaviour. Only char*
is allowed to alias other types in ISO C. Asm has well-defined behaviour for accessing bytes of memory with any width, with any partial or full overlap with earlier stores, as does GNU C when compiled with -fno-strict-aliasing
to disable type-based alias analysis / optimization.)
str
is a 32-bit word store; it writes all 4 bytes at once. If you were to load from 0000b8f1
, ...2
, or ...3
, you'd get the 2nd, 3rd, or 4th byte, so str
is equivalent to 4 separate strb
instructions (with shifts to extract the right bytes), except for the obvious lack of atomicity and performance.
str
always stores 4 bytes from a 32-bit register. If a register holds a value like 2, that means the upper bytes are all zero.
ARM can be big- or little-endian. I think modern ARM systems are most often little-endian, like x86, so the least-significant byte of a value is stored at the lowest address.
The byte at 0000b8f0
can't hold 20000000 on its own; a byte isn't that large, if that's what you're asking.
Note that 0000b8f4 is the low byte of the next word; it's a 4-byte-aligned address.
Also, storing an int64_t
with 20000000
would require two 32-bit stores. e.g. two str
instructions, or an ARMv8 stp
to do a 64-bit store of a pair of registers, or an stm
store-multiple instruction with two registers. Or eight strb
byte-store instructions.
Footnote 1: That's from a software PoV, not how memory controllers, data busses, or DRAM chips are physically organized. Or even caches, thus byte stores and sometimes even loads can be less efficient than whole words on ARM, even apart from only moving 1/4 or 1/8th the amount of data as str
or stp
Footnote 2: memcpy(pointer, &tmp, sizeof(uint32_t))
is a portable way in C to describe a 4-byte store; on byte-addressable machines sizeof(uint32_t) == 4
. memcpy
copies between two memory locations in the C abstract machine, but in practice compilers can optimize a 4-byte variable into a register and optimize that memcpy
to an str
instruction, using an addressing mode to generate the pointer
address. See also Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? re: alignment and strict-aliasing considerations to keep C compilers happy. Strict-aliasing isn't a thing in asm since there's no further optimization, just translation to machine code (by an assembler.)