Calculating cost of an instruction in assembly language

I am reading about code generation from the dragon book. It gives a naive method to associate a cost with each target language association.

We shall assume each target-language instruction has an associated cost. For simplicity, we take the cost of an instruction to be one plus the costs associated with the addressing modes of the operands. This cost corresponds to the length in words of the instruction. Addressing modes involving registers have zero additional cost, while those involving a memory location or constant in them have an additional cost of one, because such operands have to be stored in the words following the instruction.

Some examples:

The instruction LD R0,R1 copies the contents of register R1 into register R0. This instruction has a cost of one because no additional memory words are required.
The instruction LD R0,M loads the contents of memory location M into register R0. The cost is two since the address of memory location M is in the word following the instruction.
The instruction LD R1,*100(R2) loads into register R1 the value given by contents(contents(100+contents(R2))). The cost is three because the constant 100 is stored in the word following the instruction. Here contents(x) denotes the contents of the register or memory location represented by x.

I understood the cost calculation for the first 2 examples. I don't get the third one. How is the cost 3? Also I don't understand the bold part in the quoted text above.

Of what I partially understood I supposed the cost of BLTZ *R3,R0 to be 3 as it is so for the similar third example above. But the cost of this is 1. How?

Note BLTZ r, L causes a jump to label L if the value in register r is less than zero, and allows control to pass to the next machine instruction if not.

Solution

I don't get the third one. How is the cost 3?

1 for the instruction
1 for fetching the constant 100 from the next word
1 for fetching the value from R2+100

cost of BLTZ *R3,R0 to be 3 as it is so for the similar third example above. But the cost of this is 1. How?

Because: Addressing modes involving registers have zero additional cost.

I don't understand the bold part in the quoted text above.

Apparently the machine code for this architecture uses a word to specify the opcode along with the register operands, and an additional word if needed for a constant.

For example LD R1,*100(R2) takes 2 words. The first specifies the operation (LD with a register-relative offset) along with the destination register R1 and the base register R2. These are bitfields in the word. The second word then contains the 100 which the cpu knows to fetch because of the opcode in the first word.

Some fixed-length architectures pack constants into the first word, but then obviously they can only have limited range. Using a separate word allows for full range.