Search code examples
assemblynasmmasmgnu-assembler

How to create symbols with weird names in assembler?


I would like to be able to define a symbol in an assembler file with any name whatsoever that does not contain NUL characters. How do I get the GNU assembler to create such symbols? What about NASM? MASM?

Edit: I am using the following Python script for testing (requires Python 3.5.1+):

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import tempfile
import os.path
import subprocess
import ctypes

def main(symbolname, quoter):
    join = os.path.join
    with tempfile.TemporaryDirectory() as d:
        as_file_name = join(d, 'test.s')
        with open(as_file_name, 'w') as file_object:
            assembler = '''\
\t.globl "{0}"
"{0}":
\tmov $0x0, %rdi # exit status
\tmov $231, %rax # __NR_exit_group
\tsyscall
'''.format(quoter(symbolname))
            file_object.write(assembler)
        objectname, sharedlib = join(d, 'test.o'), join(d, 'test.so')
        subprocess.check_call(['as', '-o', objectname, as_file_name])
        subprocess.check_call(['ld', objectname, '-shared', '-o', sharedlib])
        mydll = ctypes.pydll.LoadLibrary(sharedlib)
        mydll[symbolname]
if __name__ == '__main__':
    main('a', lambda x: x)

I am trying to figure out what I can put instead of the identity function passed to main, so that the code will work whatever string I put instead of 'a'


Solution

  • Works for me in GAS: .comm "my weirdsym .$ 12 foo^M bar" 2 (where that ^M is a literal carriage return, and makes the output of objdump -t look funny).

    Creating such symbols with the label: syntax probably isn't always possible. The GAS manual doesn't mention quoted label names in its description of the statement syntax, and it doesn't work for me: test.S:52: Error: junk at end of line, first unrecognized character is '"' for an input of "foobar":.

    If you really want this, you can probably use .set to get a context where a symbol name is expected, so you can use quotes. Then you can give a symbol whatever value you want, including the value of another symbol (e.g. a sensibly-named label).

    For example (thanks @FUZxxl):

    # symbol includes a literal doublequote, and a literal newline
    # symbol value(address) is . which means current position 
    .set "\"my weirdsym .$ 12 foo^M bar", .
        nop                                                                                                                                                          
    

    objdump -drwC -Mintel output:

     bar>:00000000a7 <"my weirdsym .$ 12 foo
      a7:   90                      nop
    

    I highly recommend doing some sanity checks on symbol names in your code, because it's probably not very helpful (for anyone debugging your object files) to create symbol names with non-printable characters.

    A custom name-mangling scheme to encode things into characters that are legal for C function/variable names would also work.

    But if you really want to do this, this is how (with GAS). It's probably not possible with NASM.