assembly memory arm cpu-architecture memory-address

how does storing into and loading from memory work; which addresses are affected when you store a 32-bit word?

I am working on a binary analysis project where I am building a lifter that translates assembly to llvm. I built a memory model but a bit confused on how str and ldr arm assembly instructions work on the memory. So my question is. given a memory address 0000b8f0 for example in which I would like to store a 64 bit decimal value of 20000000. does the str instruction store the entire 20000000 in address 0000b8f0 or does it divide it into bytes and stores first byte in 0000b8f0 and 2nd byte in 0000b8f1 and 3rd byte in 0000b8f2 and so on...and same goes for loading from an address (0000b8f0) does the ldr instruction take just the byte stored at 0000b8f0 or the full set of bytes from 0000b8f0-0000b8f4.

sorry if my question is very basic but I need to make sure I correctly implement the effects of the str and ldr on my memory model.

Solution

Logically¹, memory is an array of 8-bit bytes.

Word load/stores access more than one byte at once, just like SIMD intrinsics in C, or like the opposite of ((char*)my_int)[2] to load the 3rd byte of an int.

C's memory model was designed around a byte-addressable machine that supports wider accesses (like PDP-11 or ARM), so it's what you're used to if you understand how char* works in C for accessing the object-representation of other objects, e.g. why memcpy² works.

(I didn't use a C example of pointing an int* at a char array because the strict-aliasing rule in C makes that undefined behaviour. Only char* is allowed to alias other types in ISO C. Asm has well-defined behaviour for accessing bytes of memory with any width, with any partial or full overlap with earlier stores, as does GNU C when compiled with -fno-strict-aliasing to disable type-based alias analysis / optimization.)

str is a 32-bit word store; it writes all 4 bytes at once. If you were to load from 0000b8f1, ...2, or ...3, you'd get the 2nd, 3rd, or 4th byte, so str is equivalent to 4 separate strb instructions (with shifts to extract the right bytes), except for the obvious lack of atomicity and performance.

str always stores 4 bytes from a 32-bit register. If a register holds a value like 2, that means the upper bytes are all zero.

ARM can be big- or little-endian. I think modern ARM systems are most often little-endian, like x86, so the least-significant byte of a value is stored at the lowest address.

The byte at 0000b8f0 can't hold 20000000 on its own; a byte isn't that large, if that's what you're asking.

Note that 0000b8f4 is the low byte of the next word; it's a 4-byte-aligned address.

Also, storing an int64_t with 20000000 would require two 32-bit stores. e.g. two str instructions, or an ARMv8 stp to do a 64-bit store of a pair of registers, or an stm store-multiple instruction with two registers. Or eight strb byte-store instructions.

Footnote 1: That's from a software PoV, not how memory controllers, data busses, or DRAM chips are physically organized. Or even caches, thus byte stores and sometimes even loads can be less efficient than whole words on ARM, even apart from only moving 1/4 or 1/8th the amount of data as str or stp

Footnote 2: memcpy(pointer, &tmp, sizeof(uint32_t)) is a portable way in C to describe a 4-byte store; on byte-addressable machines sizeof(uint32_t) == 4. memcpy copies between two memory locations in the C abstract machine, but in practice compilers can optimize a 4-byte variable into a register and optimize that memcpy to an str instruction, using an addressing mode to generate the pointer address. See also Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? re: alignment and strict-aliasing considerations to keep C compilers happy. Strict-aliasing isn't a thing in asm since there's no further optimization, just translation to machine code (by an assembler.)