Search code examples
assemblyclangx86-64gnu-assemblerintel-syntax

How to get `mov rdx, symbol` to move symbol value and not value at symbol's address in clang intel-syntax?


I have the following code which I'm using with clang on macOS:

.intel_syntax noprefix

.data

hello:  .ascii  "Hello world\n"
hello_len = . - hello

.text

.globl  _main

_main:
        mov     rax, 0x2000004
        mov     rdi, 1
        lea     rsi, [rip + hello]
        mov     rdx, hello_len       # <-------
        syscall

        mov     rax, 0x2000001
        syscall

While it looks like it should print "Hello World" and exit, it actually segfaults. It turns out it's because mov rdx, hello_len actually tries to move the value that is at address hello_len, not the value of hello_len itself.

If I used AT&T syntax, the line would be movq $hello_len, %rdx which works properly. What's the equivalent in clang's version of GAS intel syntax?


Solution

  • With real GAS (on Linux), your code assembles to a mov rdx, sign_extended_imm32 like you want.

    But yes, clang assembles it to mov rdx, [0xc] unfortunately. That may or may not be a bug, but it's definitely an incompatibility. (MacOS's gcc command is not the GNU Compiler Collection at all, it's Apple Clang: LLVM backend, clang frontend, absolutely nothing to do with the GNU project.)

    OFFSET hello_len doesn't seem to work. (I had incorrectly assumed it would on first guess, but clang doesn't support the OFFSET operator; it's .intel_syntax is not fully usable.)

    This is clang bug has already been reported. See also Why does this simple assembly program work in AT&T syntax but not Intel syntax?


    Clang can't even assemble its own .intel_syntax noprefix output.
    There may not be a way to get clang Intel syntax to use a symbol's value (address) as an immediate.

    // hello.c
    char hello[] = "abcdef";
    char *foo() { return hello; }
    

    clang -S prints mov edi, offset hello which won't assemble with clang's built-in assembler! https://godbolt.org/z/x7vmm4.

    $ clang -fno-pie -O1 -S -masm=intel hello.c
    $ clang -c hello.s
    hello.s:10:18: error: cannot use more than one symbol in memory operand
            mov     eax, offset hello
                                ^
    $ clang --version
    clang version 8.0.1 (tags/RELEASE_801/final)
    Target: x86_64-pc-linux-gnu
       ...
    

    IMO this is a bug, you should report it on clang's https://bugs.llvm.org

    (Linux non-PIE executables can take advantage of static addresses being in the low 32 bits of virtual address space by using mov r32, imm32 instead of RIP-relative LEA. And of course not mov r64, imm64.)


    Workarounds: you can't just use the C preprocessor. . - hello is context-sensitive; it has a different value when . is a different position. So a text substitution wouldn't work.

    Ugly Workaround: switch to .att_syntax and back:

    Switch to .att_syntax and back for mov $hello_len, %edx

    Ugly and inefficient workaround: lea

    This won't work for 64-bit constants, but you can use lea to put a symbol address into a register.

    Unfortunately clang/LLVM always uses a disp32 addressing mode, even for register + small constant, when the small constant is a named symbol. I guess it really is treating it like an address that might have a relocation.

    Given this source:

    ##  your .rodata and  =  or .equ symbol definitions
    
    _main:
            mov     eax, 0x2000004             # optimized from RAX
            mov     edi, 1
            lea     rsi, [rip + hello]
            mov     edx, hello_len             # load
            lea     edx, [hello_len]           # absolute disp32
            lea     edx, [rdi-1 + hello_len]   # reg + disp8 hopefully
    #       mov     esi, offset hello          # clang chokes.
    #        mov     rdx, OFFSET FLAT hello_len       # clang still chokes
    .att_syntax
           lea    -1+hello_len(%rdi), %edx
           lea    -1+12(%rdi), %edx
           mov    $hello_len, %edx
    .intel_syntax noprefix
            syscall
    
            mov     rax, 0x2000001
            syscall
    

    clang assembles it to this machine code, as disassembled by objdump -drwC -Mintel. Note that the LEA needs a ModRM + SIB to encode a 32-bit absolute addressing mode in 64-bit code.

       0:   b8 04 00 00 02          mov    eax,0x2000004       # efficient 5-byte mov r32, imm32
       5:   bf 01 00 00 00          mov    edi,0x1
                                                                # RIP-relative LEA
       a:   48 8d 35 00 00 00 00    lea    rsi,[rip+0x0]        # 11 <_main+0x11>   d: R_X86_64_PC32        .data-0x4
    
      11:   8b 14 25 0c 00 00 00    mov    edx,DWORD PTR ds:0xc   # the load we didn't want
      18:   8d 14 25 0c 00 00 00    lea    edx,ds:0xc             # LEA from the same [disp32] addressing mode.
      1f:   8d 97 0b 00 00 00       lea    edx,[rdi+0xb]          # [rdi+disp32] addressing mode, missed optimization to disp8
      25:   8d 97 0b 00 00 00       lea    edx,[rdi+0xb]          # AT&T lea    -1+hello_len(%rdi), %edx same problem
      2b:   8d 57 0b                lea    edx,[rdi+0xb]          # AT&T with lea hard-coded -1+12(%rdi)
      2e:   ba 0c 00 00 00          mov    edx,0xc                # AT&T mov    $hello_len, %edx
    
      33:   0f 05                   syscall 
      35:   48 c7 c0 01 00 00 02    mov    rax,0x2000001          # inefficient mov r64, sign_extended_imm32 from your source
      3c:   0f 05                   syscall 
    

    GAS assembling the same source makes 8d 57 0b lea edx,[rdi+0xb] for the lea edx, [rdi-1 + hello_len] version.

    See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code/132985#132985 - LEA from a known-constant register is a win for code-size with nearby / small constants, and is actually fine for performance. (As long as the known-constant got that way without a dependency on a long chain of calculations).

    But as you can see, clang fails to optimize that and still uses a reg+disp32 addressing mode even when the displacement would fit in a disp8. It's still slightly better code-size than [abs disp32] which requires a SIB byte; without a SIB byte that encoding means [RIP + rel32].