Search code examples
assemblysyntaxx86hexnasm

Where in the NASM spec is the syntax FFFFh for hexadecimal number not allowed?


I am trying to assemble a tiny piece of code with ASM. This code sets CX to zeros and AX to ones. My code:

mov cx, 0000h
mov ax, ffffh

But I get this error:

$ nasm foo.asm
foo.asm:2: error: symbol `ffffh' not defined

I can resolve this error by writing mov ax, 0ffffh instead. But why does it not understand the ffffh syntax? Where in the NASM documentation does it specify what hexadecimal syntax is allowed and what is not?

I read https://nasm.us/doc/nasmdoc3.html#section-3.4.1 but cannot find anything there that disallows the ffffh syntax. What am I missing?

I read some of the other similar questions too provided as comments to this question. But none of them seems to point to some authoritative documentation or specification that confirms that a number must begin with a digit. If someone can point to the exact excerpt in NASM documentation or some spec that confirms this, that would answer this question.


Solution

  • Surprisingly I don't see an explicit mention of this grammar rule that a numeric literal must start with a decimal digit. It's mentioned indirectly in the section you linked, a hex number prefixed with a $ sign must have a digit after the $ rather than a letter, but they fail to say "must still" to even imply that that's always required.

    Earlier, in 3.1, they say that identifiers must start with letters, but don't say that only identifiers can start with letters. (Because that's not true, so can register names and instruction mnemonics. But not numeric literals.)


    This might be one of those things that's so "obviously" and well-known to be true (to the developers / manual authors) that they forgot to write it down explicitly in the manual anywhere.

    The hex examples do show it, though, including 0c8h but no c8h. They do show examples in other bases where the leading zero isn't required.


    Some of the things that make it "obvious" and necessary that tokens starting with an alphabetic character should never be parsed as numeric literals:

    • AH through DH are register names, so must not get parsed as numbers. It would be very weird if EH was a numeric literal but DH wasn't. (It's normal that register names fit the same pattern as symbol names, not numbers. Unless you're on PowerPC, where GAS syntax just uses bare numbers for both registers and immediates; you have to remember which positions are which by instruction. Or use gcc -mregnames. But that's an IBM architecture so of course it uses weird conventions, like numbering the bits backwards.)

    • It would be super weird for abcdefgh to be a symbol name but abcdefh to be a numeric literal (because without the g, it's all valid hex digits and a trailing h.)

    • You couldn't use English words like each: as label / symbol names, for the same reason you can't use 1234:. (I tried; foo.asm:1: error: label or instruction expected at start of line). That's a valid C identifier, so it would be inconvenient not to be able to use it. $eax lets you use that as a symbol name, but $1234 in NASM is equivalent to 0x1234, with $ doing double duty as a hex indicator, so it doesn't make something into a symbol name if the thing uses digits.

    • And perhaps most importantly, this is how earlier x86 assemblers for DOS worked, ones that NASM cherry-picked the good parts of their syntax from. Like MASM, but also A86 and as86 and stuff like that.
      In the early days of NASM, people were switching to NASM from other assemblers and would already know this rule. (How to represent hex value such as FFFFFFBB in x86 assembly programming? mentions a few other assemblers other than NASM.)


    None of this justifies the omission from the manual, merely explains it. A wording tweak to mention this in 3.4.1 would be a good idea.