c memory cpu-architecture memory-alignment

Why data with smaller size than CPU word size need to be aligned at multiple of its size?

Let's assume a 32-bit CPU and 2-byte short. I know 4-byte int needs to be aligned at address multiple of 4 to avoid extra reads.

Questions:

If a short is stored at 0x1, the CPU can still read from 0x0 in one operation. So, why do shorts need to be aligned at an address multiple of 2?
If a short is stored at 0x2, why would it considered aligned and efficient since the CPU can only read from 0x0 and discard the first two bytes?

There is a question that is very similar to this, however, the answer only tells us the alignment requirement is the same for short in the struct and the standalone variable. There is also a comment with 2 upvotes saying:

On many machines, accessing an N-byte quantity (at least for N in {1, 2, 4, 8, 16}) works most efficiently when the quantity is N-byte aligned. It's the way life is; get used to it because I doubt that chip manufacturers are going to change it just because you think it isn't the way it should be.

But why?

Solution

Most machines are designed with memory that is addressed using a combination of a "words" address that identifies a group of two or more bytes, along with byte-select lines that indicate which bytes within a word are being accessed. When performing operations that are a word or smaller, all bytes within a word can be accessed simultaneously. Operations larger than a word will always need to be split into multiple operations, and most CPUs won't care about alignment of any chunks larger than the word size; some CPUs may be able to split word-size-or-smaller operations that would require accessing parts of two consecutive words into smaller operations, but that ability is not universal.

The standard guarantees that for any power of two, N, a multiple-of-N offset into an allocation which is suitably aligned for N-byte objects will yield an address which is suitably aligned for N-byte objects. It does not guarantee that platforms with smaller words sizes will tolerate looser alignments because:

The Standard deliberately waives jurisdiction over non-portable constructs.
Implementations which want to offer stronger guarantees are free to do, and compiler writers that want to uphold the Spirit of C will do so absent any reason to do otherwise.
Even on 8-bit platforms, there may be advantages to requiring word alignment, even though ironically I'm not aware of any implementations ever doing so in the 8-bit platforms where it would have been most useful. For example, on the Z80, the common way to load DE with a 16-bit value whose address is in HL would be:
```
 mov e,(HL)
 inc hl
 mov d,(hl)
```

but if HL was known to be even, the second instruction could be replaced by inc l, which would be two cycles faster, cutting the total time from 20 cycles to 18. Not a huge performance win, but if an application wouldn't ever use odd addresses for word-sized values, it would represent "low-hanging fruit".