I have a simple 32-bit assembly code that I wrote:
movl $0x542412e6, %eax
movl %ebp , %edx
addl $0x30, %edx
movl %edx, %ebp
pushl 0x08048dd6
ret
When I run this command:
gcc -m32 -c e.s
I get the following 18 bytes:
0: b8 e6 12 24 54 mov $0x542412e6,%eax
5: 89 ea mov %ebp,%edx
7: 83 c2 30 add $0x30,%edx
a: 89 d5 mov %edx,%ebp
c: 68 d6 8d 04 08 push $0x8048dd6
11: c3 ret
Why is the object code 18 bytes and not 20 or 16? Shouldn't it always be in 4-byte words for a 32-bit machine?
Instruction size does not related to data or address bus size. Some 16-bit x86 CPUs have 3 totally different sizes with 8-bit data bus, 20-bit address bus and variable length instruction size. Modern 32-bit or 64-bit x86 have variable length instruction too for backward compatibility.
Just look at the movl $0x542412e6, %eax
and pushl 0x08048dd6
line and you'll see that it's impossible to encode 32-bit immediate data, opcode and register within 32 bits of data. If an architecture uses 32-bit fixed-length instruction then it must use multiple instructions or a literal pool to load 32-bit constant.
RISC architectures often have fixed width instructions as a trade-off between code density and decoder simplicity. But 32-bit RISC architectures with instruction size different from 32-bit also exist. For example MIPS16e and ARM thumb v1 have 16-bit instructions whereas ARM thumb2 and dalvikVM have variable length instructions. Modern 64-bit RISC architectures also won't have 64-bit instructions but rather often stick with the 32-bit size