Search code examples
parsingassemblysyntaxintel-8080

How should assemblers distinguish between symbol and all-alpha hex value?


I'm learning some 8080 assembly, which uses the older suffix H to indicate hexadecimal constants (vs modern prefix 0x or $). I'm also noodling around with a toy assembler and thinking about how to tokenize source code.

It's possible to write a valid hex constant (say) BEEFH, which contains only alphabetical characters. It's also possible to define a label called BEEFH. So when I write:

ORG 0800H

START:  ... 
        JMP BEEFH   ; <--- how is this resolved?
        .... 

BEEFH:  ... 
        ...

This should be syntactically valid based on the old Intel docs: BEEFH meets the naming rules for labels, and of course is also a valid 16-bit address. The ambiguity of whether the operand to JMP here is an address constant or an identifier seems like a problem.

I don't have access to the original 8080 assembler to see what it does with this example. Here's an online 8080 assembler that appears to parse the operand to JMP as a label reference in all cases, but obviously a proper assembler should be able to target an absolute address with a JMP instruction.

Can anyone shed light on what the conventions around this actually are/should be? Am I missing something obvious? Thanks.


Solution

  • Someone left a comment that they then deleted, but I looked again and it was right on. Apparently I missed the note in the old Intel manual that says this about hex constants:

    enter image description here

    Hex constants must begin with a decimal digit. So that's certainly how you avoid the semantic ambiguity when parsing. It seems a bit inelegant to me as a solution but I guess then you should just use a modern prefix.

    Thanks, anonymous commenter!