assembly x86-64 x86-16 multiplication instructions

Why does 8-bit MUL combine into AX but 16 and 32-bit MUL leave their result split between [E]DX:[E]AX?

MUL CL CL is BYTE size and it equals to AX = AL * CL
MUL BX BX is WORD size and it equals to DX:AX = AX * BX
MUL EBX EBX is DWORD size and it equals to EDX:EAX = EAX * EBX

I want to know why the 8-bit size MUL instruction gives the result in AX and not in DL:AL?

Is it because for 16-bit MUL, result can be EAX, for 32-bit MUL result can be RAX, but for 64-bit mul rcx there wouldn't be be any single register wide enough to hold the result?

Solution

The 8086 had 16-bit registers (AX, BX, ...). So an 8*8=16 widening multiply (that is, with 8-bit inputs and a 16-bit result) could fit its result in a single register. Splitting it across two registers would have been inconvenient and provided no benefit.

But the 16*16=32 widening multiply could not fit its result in a single register, because there were no 32-bit registers. By necessity it had to be split across two registers, and DX and AX were chosen.

Likewise, the 386 had 32-bit registers (EAX, EBX, ...), so its 32*32=64 widening multiply had to have its result split. EDX:EAX were chosen for similarity with 8086.

Intel could at this point have added a new version of the 16*16=32 MUL which left its result in a single 32-bit register such as EAX, but they chose not to, perhaps for compatibility or to avoid unnecessary extra complexity, or from simple inertia. So the 386's 16*16=32 MUL still leaves its result split across DX:AX, even in 32-bit mode.

(They did, however, add a non-widening 32*32=32 form of the signed multiply IMUL instruction, which leaves its result in a single 32-bit register. One can use this for signed 16*16=32 multiplication by sign-extending the inputs, for which the convenient MOVSX was also added. And it can be used for 16*16=32 unsigned multiply, by zero-extending the inputs, if one knows that the product will be less than 2^31.)

Likewise again, x86-64 had 64-bit registers. For the existing multiplication instructions, they kept the behavior the same (so 32*32=64 still splits its result across EDX:EAX instead of using a single 64-bit register), and they added a 64*64=128 widening multiply which, again, must split its result, and leaves it in RDX:RAX. And there is also a non-widening 64*64=64 signed IMUL which leaves its result in a single 64-bit register.