Why does the assembly encoding of objdump vary?

I was reading this article about Position Independent Code and I encountered this assembly listing of a function.

0000043c <ml_func>:
 43c:   55                      push   ebp
 43d:   89 e5                   mov    ebp,esp
 43f:   e8 16 00 00 00          call   45a <__i686.get_pc_thunk.cx>
 444:   81 c1 b0 1b 00 00       add    ecx,0x1bb0
 44a:   8b 81 f0 ff ff ff       mov    eax,DWORD PTR [ecx-0x10]
 450:   8b 00                   mov    eax,DWORD PTR [eax]
 452:   03 45 08                add    eax,DWORD PTR [ebp+0x8]
 455:   03 45 0c                add    eax,DWORD PTR [ebp+0xc]
 458:   5d                      pop    ebp
 459:   c3                      ret

0000045a <__i686.get_pc_thunk.cx>:
 45a:   8b 0c 24                mov    ecx,DWORD PTR [esp]
 45d:   c3                      ret

However, on my machine (gcc-7.3.0, Ubuntu 18.04 x86_64), I got slightly different result below:

0000044d <ml_func>:
 44d:   55                      push   %ebp
 44e:   89 e5                   mov    %esp,%ebp
 450:   e8 29 00 00 00          call   47e <__x86.get_pc_thunk.ax>
 455:   05 ab 1b 00 00          add    $0x1bab,%eax
 45a:   8b 90 f0 ff ff ff       mov    -0x10(%eax),%edx
 460:   8b 0a                   mov    (%edx),%ecx
 462:   8b 55 08                mov    0x8(%ebp),%edx
 465:   01 d1                   add    %edx,%ecx
 467:   8b 90 f0 ff ff ff       mov    -0x10(%eax),%edx
 46d:   89 0a                   mov    %ecx,(%edx)
 46f:   8b 80 f0 ff ff ff       mov    -0x10(%eax),%eax
 475:   8b 10                   mov    (%eax),%edx
 477:   8b 45 0c                mov    0xc(%ebp),%eax
 47a:   01 d0                   add    %edx,%eax
 47c:   5d                      pop    %ebp
 47d:   c3                      ret

The main difference I found was that the semantic of mov instruction. In the upper listing, mov ebp,esp actually moves esp to ebp, while in the lower listing, mov %esp,%ebp does the same thing, but the order of operands are different.

This is quite confusing, even when I have to code hand-written assembly. To summarize, my questions are (1) why I got different assembly representations for the same instructions and (2) which one I should use, when writing assembly code (e.g. with __asm(:::);)

Solution

obdjump defaults to -Matt AT&T syntax (like your 2nd code block). See att vs. intel-syntax. The tag wikis have some info about the syntax differences: https://stackoverflow.com/tags/att/info vs. https://stackoverflow.com/tags/intel-syntax/info

Either syntax has the same limitations, imposed by what the machine itself can do, and what's encodeable in machine code. They're just different ways of expressing that in text.

Use objdump -d -Mintel for Intel syntax. I use alias disas='objdump -drwC -Mintel' in my .bashrc, so I can disas foo.o and get the format I want, with relocations printed (important for making sense of a non-linked .o), without line-wrapping for long instructions, and with C++ symbol names demangled.

In inline asm, you can use either syntax, as long as it matches what the compiler is expecting. The default is AT&T, and that's what I'd recommend using for compatibility with clang. Maybe there's a way, but clang doesn't work the same way as GCC with -masm=intel.

Also, AT&T is basically standard for GNU C inline asm on x86, and it means you don't need special build options for your code to work.

But you can use gcc -masm=intel to compile source files that use Intel syntax in their asm statements. This is fine for your own use if you don't care about clang.

If you're writing code for a header, you can make it portable between AT&T and Intel syntax using dialect alternatives, at least for GCC:

static inline
void atomic_inc(volatile int *p) {
    // use __asm__ instead of asm in headers, so it works even with -std=c11 instead of gnu11
    __asm__("lock {addl $1, %0 | add %0, 1}": "+m"(*p));
// TODO: flag output for return value?
   // maybe doesn't need to be asm volatile; compilers know that modifying pointed-to memory is a visible side-effect unless it's a local that fully optimizes away.
   // If you want this to work as a memory barrier, use a `"memory"` clobber to stop compile-time memory reordering.  The lock prefix provides a runtime full barrier
}

source+asm outputs for gcc/clang on the Godbolt compiler explorer.

With g++ -O3 (default or -masm=att), we get

atomic_inc(int volatile*):
    lock addl $1, (%rdi)              # operand-size is from my explicit addl suffix
    ret

With g++ -O3 -masm=intel, we get

atomic_inc(int volatile*):
    lock  add DWORD PTR [rdi], 1      # operand-size came from the %0 expansion
    ret

clang works with the AT&T version, but fails with -masm=intel (or the -mllvm --x86-asm-syntax=intel which that implies), because that apparently only applies to code emitted by LLVM, not for how the front-end fills in the asm template.

The clang error message is:

<source>:4:13: error: unknown use of instruction mnemonic without a size suffix
    __asm__("lock {addl $1, %0 | add %0, 1}": "+m"(*p));
            ^
<inline asm>:1:2: note: instantiated into assembly here
        lock  add (%rdi), 1
        ^
1 error generated.

It picked the "Intel" syntax alternative, but still filled in the template with an AT&T memory operand.