Search code examples
assemblynasmx86-64mach-odata-segment

NASM compiling x86_64 ASM label addresses off by 256 bytes in Mach-O when using multiple db declarations?


In short, when I have multiple db sections in my .data section, the compiled addresses/labels are off when compiled by NASM. In my testing they are off by 256 bytes in the resulting Mach-O binary.

Software I am using:

  • OS X 10.10.5
  • nasm NASM version 2.11.08, installed via Homebrew as required for x84_64 ASM
  • gobjdump GNU objdump (GNU Binutils) 2.25.1, installed via Homebrew
  • clang Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)

What works:

Take for example the following "hello world" NASM assembly.

main.s

global _main

section .text
_main:
mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel msg]
mov     rdx, len
syscall

mov     rax, 0x2000001
mov     rdi, 0
syscall

section .data
msg:    db      "Hello, world!", 10
len:    equ     $ - msg

Compiled and run with:

/usr/local/bin/nasm -f macho64 -o main.o main.s
clang -o main main.o
./main

This works great, and produces the following output:

Hello, world!

What doesn't:

Now, to add another message, we just need to add another string to the data section, and another syscall. Simple enough.

main.s

global _main

section .text
_main:
mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel msga]
mov     rdx, lena
syscall

mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel msgb]
mov     rdx, lenb
syscall

mov     rax, 0x2000001
mov     rdi, 0
syscall

section .data
msga:    db      "Hello, world!", 10
lena:    equ     $ - msga
msgb:    db      "Break things!", 10
lenb:    equ     $ - msgb

Compile and run, same as before, and we get:

Break things!

What?!? Shouldn't we be getting?:

Hello, world!
Break things!

What's wrong?:

Something clearly went wrong. Time to disassemble the resulting binary and see what we got.

$ gobjdump -d -M intel main

Produces the following for _main:

0000000100000f7c <_main>:
   100000f7c:b8 04 00 00 02       mov    eax,0x2000004
   100000f81:bf 01 00 00 00       mov    edi,0x1
   100000f86:48 8d 35 73 01 00 00 lea    rsi,[rip+0x173]        # 100001100 <msgb+0xf2>
   100000f8d:ba 0e 00 00 00       mov    edx,0xe
   100000f92:0f 05                syscall 
   100000f94:b8 04 00 00 02       mov    eax,0x2000004
   100000f99:bf 01 00 00 00       mov    edi,0x1
   100000f9e:48 8d 35 69 00 00 00 lea    rsi,[rip+0x69]        # 10000100e <msgb>
   100000fa5:ba 0e 00 00 00       mov    edx,0xe
   100000faa:0f 05                syscall 
   100000fac:b8 01 00 00 02       mov    eax,0x2000001
   100000fb1:bf 00 00 00 00       mov    edi,0x0
   100000fb6:0f 05                syscall 

From the comment # 100001100 <msgb+0xf2>, we can see that it is pointing not to the msga symbol, but to 0xf2 past msgb, or 100001100 (at this address there are null bytes, resulting in no output). Inspecting the binary in a hex editor, I find the actual msga string at offset 1000, or address 100001000. The means that the address in the compiled binary is now off by 0x100/256 bytes, simply because there is now a second db label. What?!?


A sorry excuse for a workaround:

As an experiment, I decided to try putting both of the db sections into separate ASM/object files, and linking all 3 together. Doing so works.

main.s

global _main
extern _msga
extern _lena
extern _msgb
extern _lenb

section .text
_main:
mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel _msga]
mov     rdx, _lena
syscall

mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel _msgb]
mov     rdx, _lenb
syscall

mov     rax, 0x2000001
mov     rdi, 0
syscall

msga.s

global _msga
global _lena

section .data
_msga:   db      "Hello, world!", 10
_lena:   equ     $ - _msga

msgb.s

global _msgb
global _lenb

section .data
_msgb:   db      "Break things!", 10
_lenb:   equ     $ - _msgb

Compile and run with:

/usr/local/bin/nasm -f macho64 -o main.o main.s
/usr/local/bin/nasm -f macho64 -o msga.o msga.s
/usr/local/bin/nasm -f macho64 -o msgb.o msgb.s
clang -o main msga.o msgb.o main.o
./main

Results in:

Hello, world!
Break things!

While this does work, I find it hard to believe this is the best solution.


What is going wrong?

Surely there must be a way to have multiple db labels in one ASM file? Am I doing something wrong in the way I write the ASM? Is this a bug in NASM? Is this expected behavior somehow, in which case why? My workaround is extra work and clutter, so I would greatly appreciate any assistance in this.


Solution

  • Yes, it's a bug in Nasm-2.11.08. Nasm-2.11.06 should work. Nasm-2.11.09rc1 should work(?). Sorry 'bout that!