Search code examples
cmemorycharstackendianness

Little-endian byte order (in C)


I've heard that in x86 processors, bytes are stored in the memory in little-endian byte order.

Meaning that the least significant byte gets stored first.

I'm having trouble grasping the idea and its relationship with how bytes get stored in RAM.

For example,

#include <stdio.h>

char string[6];
scanf("%5s",string);

In the code above, if I input the word "Hello", "o" gets stored first(?)

From what I understand, in C (and in programming general?) when you declare a variable, the variable gets stored in the Stack portion of RAM. So the word "Hello" gets stored in the stack like this:


o    <Lower memory addresses>
l
l
e
H    <Higher memory addresses>

The stack grows from higher memory addresses towards the lower, and the processor starts reading the bytes starting from the first byte at the top of the stack (lower memory addresses).

Now if I print the value of the string, I should see "olleH".

But obviously it prints "Hello" instead.

Is this because of the little-endian byte order?


Solution

  • For simplicity, let’s discuss a machine in which each byte in memory has an address. (There are machines where memory is organized only as words with several bytes, not individual bytes.) In this machine, memory is like a big array, so we can write memory[37] to talk about the byte at address 37.

    How Characters Are Stored

    To store characters, we simply put them at successive memory locations, in order. For example, to store the characters “Hello” starting at address 100, we put H at memory[100], e at memory[101], l at memory[102], l at memory[103], and o at memory[104]. In some languages, we also put a zero value at memory[105] to mark the end of the string.

    There is no endian issue here. Characters are in order.

    How Integers Are Stored

    Consider an integer like 5678. This integer will not fit into one eight-bit byte. In binary, it is 10110 00101110 (space for readability). That requires at least two bytes to store, one byte containing 10110, and one byte containing 00101110.

    When we store it in memory starting at location 100, which byte do we put first? This is the endian issue. Some machines put the high-value byte (10110) in memory[100] and the low-value byte (00101110) in memory[101]. Other machines do it in the other order. The high-value byte is the “big end” of the number, and the low-value byte is the “little end,” leading to the term “endianness.” (The term actually comes from Jonathan Swift’s Gulliver’s Travels.)

    (This example uses only two bytes. Integers can also use four bytes, or more.)

    The endian issue arises whenever you have one object made out of smaller objects. This is why it is not a problem with individual characters—each character goes into one byte. (Although, there is no physical reason you could not store strings in reverse order in memory. We just do not.) It is a problem when an object has two or more bytes. You simply have to choose the order in which you put the bytes into memory.

    How Stacks are Organized

    Common implementations of stacks start at a high address and “grow” downward when adding things to the stack. There is no particular reason for this; we can make stacks work the other way too. It is just how things developed historically.

    Stack growth largely occurs in chunks. When a function is called, it adds some space to the stack to make room for its local data. So it decreases the stack pointer by some amount and then uses that space.

    However, within that space, individual objects are stored normally. They do not need to be reversed because the stack grows down. If the stack pointer changed from 2400 to 2200, and we now want to put an object at 2300, we just write its bytes to memory starting at 2300.

    So, endianness is not an issue affected by stack order.