Usage of the .W suffix in ARM assembler

I'm reading the Instruction set manual, and am wondering about the .W suffix.

The manual says:

In some cases it might be necessary to specify the .W suffix, for example if the operand is the label of an instruction or literal data, as in the case of branch instructions. This is because the assembler might not automatically generate the right size encoding.

I can't think of any reason why I'd need to override the assembler's default encoding. What I've tried so far always compiled fine without .W.

Did they really add a special syntax to overcome a possible bug in the assembler (maybe meanwhile fixed)?

Can you provide some example where I have to explicitly use .W (Thumb2 on Cortex-M3).

Solution

The .W suffix is still necessary in the syntax, even if most people will never use it explicitly, in order to maintain the behaviour that disassembling any valid code and passing the resulting source back through the assembler results in the exact same instructions again. Thus every valid instruction encoding must have some way of being represented unambiguously in the language, even if it isn't the preferred one the assembler would choose by default for the base mnemonic and operands. And of course, because it exists in the syntax, then you don't have to go writing machine code directly or patching binaries to get those encodings in the first place.

Now, there are various esoteric reasons for wanting non-preferred encodings in the final binary - things like tweaking instruction alignment, using code as data, etc. - but the least crazy one is probably relocations. If you have a branch to an external symbol, or a symbol in a different section, the assembler doesn't necessarily know how far away that symbol is going to end up. Therefore it has to choose between emitting a narrow instruction which may end up unlinkable, or a wide instruction which may end up wasting code space if the target ends up close enough. Both GNU as and armasm seem to go for the latter, although it's not that hard to imagine some specialised embedded assembler defaulting to the former for size reasons.

Runtime relocations are an even stronger argument: you have a branch or literal load which is resolved at assembly time, but you might want to hotpatch under certain circumstances to target something else. That could require the extra range of a wide encoding even when the original target lies within range of a narrow one.