Search code examples
assemblymacrosnasmpreprocessor

Is there a point in using %assign over %define?


It seems to me that the %define directive to define a single-line macro is just the %assign directive with additional features like an ability to get parameters. If that's the case, what's the point in using %assign? Additionally, what about %xdefine and equ?

The answer here was not clear to me as it was too short. I also read the documentation but I didn't see any advantage in using %assign.


Solution

  • Summary: %define is just text replacement. %assign is a numeric value, so you can use it to increment a counter inside a %rep with %assign i i + 1 for example, which doesn't usefully work with %define.


    There are actually three different directives the NASM preprocessor provides to define a single-line macro: %define, %xdefine, and %assign. Additionally there is a fourth way, the equ directive, which evaluates an equate at assemble time, not as part of preprocessing.

    (An aside for later: The NASM preprocessor can be used in conjunction with the assembler. In this case the assembler can communicate scalar numeric values to the preprocessor, and the two stages are not strictly separable as distinct passes.)


    %define is text-only replacement. Unlike %assign a %define single-line macro may accept parameters, specified in round parentheses behind the macro name. Because %define accepts text, you can also provide it with string data, either quoted strings or plain text (without quotes).

    If you %define i i + 1 and then evaluate the resulting single-line macro, it will evaluate to the text i + 1. The preprocessor is apparently sophisticated enough to not turn this into an infinite loop. However, it will not take into account the contents that were previously defined to the single-line macro i. If you do not provide a symbol named i to the assembler it would complain about the symbol not being defined. Example:

    $ cat test1.asm
    %define i 0
    %define i i + 1
    db i
    $ nasm test1.asm
    test1.asm:3: error: symbol `i' not defined
    $ nasm test1.asm -E
    %line 3+1 test1.asm
    db i + 1
    $
    

    Additionally, note that operator precedence can lead to surprising results if you use %define to expand a single-line macro to a numeric expression. Example:

    $ cat test2.asm
    %define macro 1 + 3
    db macro * 10h
    $ nasm test2.asm -l /dev/stderr
         1                                  %define macro 1 + 3
         2 00000000 31                      db macro * 10h
    $
    

    Note that the computed value is equal to 1 + (3 * 10h), and not (1 + 3) * 10h. To get the latter result with %define you need to include parentheses either in the content, like %define macro (1 + 3), or around the use of the macro, like db (macro) * 10h.

    For an actual example of using %define as a text replacement instead of something that could be handled numerically, consider this directive:

    %define OT(num) (0 %+ num %+ h + OPTYPES_BASE)
    

    The num parameter has a zero digit prepended and an h letter appended. Thus, a text like in OT(5B) is transformed into a well-formed hexadecimal number.


    %xdefine is a way to define a single-line macro to the textual content obtained after expanding whatever text is passed to the directive.

    You can use %xdefine i i + 1 to effectively increment the numeric value of the single-line macro i, however this will quickly blow up the memory space and processing time needed to process the macro. Example:

    $ cat test3.asm
    %define i 0
    %rep 4
    %xdefine i i + 1
    %endrep
    db i
    $ nasm test3.asm -E
    %line 5+1 test3.asm
    db 0 + 1 + 1 + 1 + 1
    $
    

    As you can see the %xdefine results in appending the text + 1 to the macro contents rather than making the preprocessor actually count up. Consequently, macros defined by %xdefine can also lead to surprising operator precedences the same as %define. You can use parentheses again, but if you repeatedly %xdefine to grow an expression in its content and include parentheses, then all the parentheses also will accumulate.

    %xdefine macro content is generally equivalent to %define macro %[content]. The square brackets construct to force immediate expansion of macros is a more recent addition to the NASM preprocessor, which is the reason that %xdefine became its own directive instead of just making users employ the square brackets.

    The use of repeated %xdefine is mainly for when you actually want to build up a list of values, such as strings or numeric expressions or symbols. For instance, here is a part of an application's source building up a list of symbols:

    %macro opsizeditem 3.nolist
     %1 equ nextindex
     %xdefine BITTAB_OPSIZEDITEMS BITTAB_OPSIZEDITEMS,%2
     ...
    %endmacro
    %assign nextindex 0
    %define BITTAB_OPSIZEDITEMS ""
    ...
    opsizeditem OP_IMM, ARG_IMMED,  imm ; immediate
    opsizeditem OP_RM,ARG_DEREF+ARG_JUSTREG,rm  ; reg/mem
    ...
    

    This starts out defining the single-line macro BITTAB_OPSIZEDITEMS to "" (the empty quoted string), then appends the multi-line macro's %2 parameter to the list every time the opsizeditem macro is used. The use of this list is simple:

    bittab:
            db BITTAB_OPSIZEDITEMS
    

    A db directive is passed the single-line macro, which expands to the desired list. The very first entry will be the quoted empty string. Like a lone db "" directive, this first entry expands to no assembly output data at all. All subsequent entries assemble to one byte of output each. (If an entry had embedded commas or consisted of a quoted string, it could be assembled into multiple bytes.)


    Unlike the prior directives, %assign actually evaluates its contents to a scalar numeric value. The use of this is so that you can make the preprocessor count, not just have it do text replacement. Example:

    $ cat test4.asm
    %assign i 0
    %rep 4
    %assign i i + 1
    %endrep
    db i
    $ nasm test4.asm -E
    %line 5+1 test4.asm
    db 4
    $
    

    %assign actually evaluates its contents at the point of the assignment. Next it checks that the result is a scalar numeric value; that is, not a symbol expression requiring any relocation. It is then formatted as a signed 64-bit decimal number.

    As a side effect, the expansion of a single-line macro defined using %assign is always treated as a single term in an expression, because it is expanded to a single term (possibly including a minus sign). Revisiting the example we had for %define but now using %assign instead:

    $ cat test5.asm
    %assign macro 1 + 3
    db macro * 10h
    $ nasm test5.asm -l /dev/stderr                             
         1                                  %assign macro 1 + 3
         2 00000000 40                      db macro * 10h
    $
    

    As is intuitive for the use of this macro, the expression expands to a value equal to (1 + 3) * 10h.

    Here is an application example of evaluating an expression once to then use it twice:

    %assign %$index %$label + 1 - (2 * %$i)
    %if %$index < 0
     %error Invalid opindex content = %$index
    %endif
    [list +]
        db %$index
    [list -]
    

    If we were to replace this with the expression as is we would have to write the expression twice, and the preprocessor and assembler would have to evaluate it twice. If we used %define then it would still need to be evaluated twice.

    For an example using an %assign directive that gets a numeric value from the assembler (as mentioned in the aside), consider this part of the application:

        %macro mne 1-2+;.nolist
    %push
    usesection ASMTABLE2, 1
    %assign %$currofs $ - asmtab
    %ifnempty %2
        db %2
    %endif
    __SECT__
    ...
        dw (%$currofs)<<4|%$string_size         ; 12 bits for asmtab ofs, 4 for length
    ...
    %pop
    %define MNCURRENT %1%[MNSUFFIX]
        %endmacro
    

    This use of %assign avoids inventing a symbol name and permanently entering that symbol into the assembler's symbol table. Instead it evaluates the scalar value that is the delta between $ (current assembly location in current section) and the symbol asmtab, and assigns the result to the context-local single-line macro %$currofs. This macro is then used in writing a numeric value to a different section. (Hence %define or just using the $ - asmtab text where %$currofs is used would not be correct, we want to get the $ value in the ASMTABLE2 section.) Afterwards, it discards the context using %pop which (theoretically) allows the preprocessor to discard all context-local variables and reclaim the memory used by them.


    All three of these preprocessor directives also have corresponding forms with an i prefix, that is %idefine, %ixdefine, and %iassign. These behave the same as the base forms, except that the single-line macro is defined case-insensitively.


    Finally, there is equ. The difference between equates and defines are that:

    1. equates are forever (they can only be assigned a single value that cannot change across the entire assembly), whereas defines hold their expansions as they were last defined in source processing order (and particularly can be given new expansions during assembly processing),

    2. an equate can evaluate either to a scalar numeric value (like %assign allows too) or a relocateable symbolic value (unlike %assign),

    3. an equate cannot evaluate to an arbitrary text or quoted string value (unlike %define),

    4. equates are entered into the symbol table, so that they show up in the map file for instance,

    5. equates can be referenced before they are defined (unlike any of the single-line macros),

    6. and finally, equates are evaluated and processed by the assembler stage, not the preprocessor stage.

    A label can be considered a special case of an equate, for instance label: is very similar to label equ $. (These two forms differ in their effect on the local-label mechanism however. A true equate is never considered as the base label for local labels. A true label is. Both can participate as local labels though, so that .local: and .local equ $ are entirely the same.) Both a label and an equate can be specified with a colon or without, though typically labels have the colon and equates don't have it. Finally, a label can be followed by an instruction on the same line while an equate cannot.