Why is LEA (Load Effective Address) necessary?

I have read this, this, and this yet to find what it is I want.

Firstly, I understand the difference between lea and mov in terms of the result they can achieve, put simply:

mov  eax, ebp   ;put the value in ebp register into eax register
lea  eax, [ebp] ;same as above, and they are equivalent

However:

mov  eax, ebp+8   ;invalid register set size
lea  eax, [ebp+8] ;calculate in sum of ebp value and 8, then assign it to eax

So why is mov eax, ebp+8 illegal while lea eax, [ebp+8] is OK? My book says:

The value that MOV stores into EAX must be computed by the assembler (that is, it must in the end be a constant)

But it makes no sense to me! What is meant by CONSTANT? The obvious understanding is that the CONSTANT should be calculated by the assembler/linker before the program runs, BUT, think about mov eax, [ebp+8] is a LEGAL instruction. The assembler/linker has no way of knowing the value of [ebp+8] (*(ebp+8) as C lingo) before the program runs!

Solution

Your book seems to only be talking about mov-immediate, like mov eax, 1234 or mov eax, foo (symbol address).

A more complete rule is that the source operand for mov has to either be:

a link-time constant value (mov-immediate)
or a value that already exists somewhere else (memory or another register).

mov can only copy, it won't feed the data through an ALU before writing the destination.

A memory source operand can use an addressing mode like [ebp+8] or [edx + eax*4 + my_array], but that's unrelated to what happens to the data loaded from or stored to that address. Address-generation is done in a separate part of the CPU (or on original 8086, in a separate phase of microcode processing the instruction).

x86's machine-code format encodes addressing modes (and register vs. memory source) the same way for almost all instructions, so mov doesn't have to do anything special to support mov eax, [ebp+8], it's just like add eax, [ebp+8] which also does an add with the data after doing the address calculation and a load. Only the opcode byte differs between those, specifying what to do with the data, not where the operands are (register vs. memory addressing mode). (There are two opcodes for most instructions, one where the source can be memory, one where the destination can be memory. I'm talking about two memory-source opcodes here.)

The machine code for lea eax, [ebp+8] is also the same as those except for the opcode byte. The special part of LEA is that it uses the machine-code format for an addressing mode to encode a shift/add instruction that doesn't actually access memory. See Using LEA on values that aren't addresses / pointers?

No other instruction can do that so you can never use eax+3 as a source operand for any instruction. e.g. there's no way to do imul ecx, eax+3 with only one instruction.

There are two separate concepts: where the data comes from for an instruction (immediate, register, or memory addressed by an addressing-mode) vs. what happens to that data (copy vs. imul vs. sub vs. popcount vs. ...). In that sense, LEA doesn't have a data input, since it doesn't deref the [ebp+8].

LEA just takes the address, like C's & address-of operator, which cancels out what would otherwise be a dereference, e.g. &ptr[3] is the same as ptr+3 but with different syntax. One thing that makes LEA interesting is that x86 ADD can only do ptr+=3, modifying a register (or memory) in place, but LEA can copy-and-add like tmp = &ptr[3].

(And with 32 or 64-bit addressing modes, also add a shifted register like tmp = &ptr[x*4 + 3])

Somewhat off-topic but maybe related:

Most classic x86 integer instructions are only one or two-operand (What kind of address instruction does the x86 cpu have?). Although imul ecx, [edx+8], 123 can do math while writing the result to a register that wasn't an input. (Unlike most immediate-source instructions, that new-in-186 form of imul didn't steal the /r field in ModRM as extra opcode bits, so it could have 3 operands including the immediate whose presence is signalled by the opcode.) But still the input to the hardware multiplier is an immediate and a value directly from memory; imul ecx, edx+8, 123 isn't encodeable.