Relation of endianness to assembly conversion of size in C

Please note that the below is adapted from Problem 3.4 of Bryant and O'Hallaron's text (CSAPP3e). I have stripped away everything but my essential question.

Context: we are looking at a x86-64/Linux/gcc combo wherein ints are 4 bytes and chars are considered signed (and, of course, 1 byte). We are interested in writing the assembly corresponding to conversion of an int to a char which, at a high level, we know arises from performing truncation.

They present the following solution:

movl (%rdi), %eax            // Read 4 bytes
movb %al, (%rsi)             // Store low-order byte

My question is whether we can change the movl to a movb since, after all, we are only using a byte in the end. My concern with this suspicion is that there might be some endian-dependence with the read, and we might somehow be getting the high bits if our processor/OS is in little-endian mode. Is this suspicion correct, or would my change work no matter what?

I would try this out but 1) I am on a Mac with Apple silicon and 2) even if my suspicion worked, I couldn't be sure if this sort of thing was implementation-dependent.

Solution

You're right to be concerned about endianness for this kind of operation, but in this case, your alternative approach would fail on big-endian machines, not on little-endian ones.

x86 is little endian, which means the low-order eight bits of a 32-bit integer are stored in the first (lowest address) byte of that integer, so

movb (%rdi), %al     // Read low-order byte
movb %al, (%rsi)     // Store low-order byte

will do the truncation you want to do on x86. But on a big-endian machine the equivalent operation would read the highest eight bits of the 32-bit integer. The m68k architecture, for instance, is big-endian; a correct version of your alternative approach, for that architecture, would be

move.b 3(%a1), %d0   // Read low-order byte
move.b %d0, (%a0)    // Store low-order byte

Without the 3 it would read the high-order byte of the int pointed to by register %a1.

The virtue of doing it the way CS:APP does it is that the same construct will work correctly on both big- and little-endian architectures. Of course, if you're programming in assembly language you have to rewrite the code anyway to move the program to a different architecture, but it's one fewer thing to worry about while you're doing that.

Compiler-generated code will probably also do it the CS:APP way for related reasons: compilers usually do most of their work in an architecture independent "intermediate representation" and then translate that to assembly language. That translation is one of the most complex phases of an industrial grade compiler, for reasons beyond the scope of this answer; every simplifying assumption that doesn't make the generated code worse will be applied to make it easier to write.