Search code examples
assemblyx86disassemblyattobjdump

Is this assembler bug? bound instruction


About bytepattern 622c24, there are 2 kind of case.


The first case : objdump - as pair.

  1. objdump disassembles 622c24 to : bound %ebp,(%esp)
  2. linux as assembles bound %ebp,(%esp) to : 622c24


The second case : library Capstone- keystone pair.

  1. Capstone disassembles 622c24 to : bound (%esp), %ebp

  2. Keystone assembles bound (%esp), %ebp to : 622c24


As you can see above, the position of source and destination is reversed.

  • objdump : bound %ebp,(%esp)
  • Capstone : bound (%esp), %ebp



According to AT&T syntax, BOUND r32, m32 is correct.
Therefore, that means Capstone-keystone pare is the correct one.

Q. So, objdump-as has problem on disassembling bound instruction?
Is it a bug of binutils?


Solution

  • Yes, this is probably a design bug in AT&T syntax. They normally follow the pattern of reversing the operands from Intel-syntax, and renaming the sign/zero-extension mnemonics (cdq => cltq, movsx eax, byte[mem] => movsbl). Deviations from that can be considered design bugs.

    But not implementation bugs unless older versions were different. It's valid (but unpleasant) when AT&T just does whatever it wants and makes up its own rules for different instructions. This might be another case of compat with the original Unixware assembler. (see below).


    The bound instruction doesn't write either of its input operands, so neither one is really a destination. And unlike cmp, operand order doesn't have any meaning. It just checks the register against both upper/lower bounds, and raises a #BR exception if it's out of bounds.

    There's only one opcode for it, and it requires register + memory operands (in the ModR/M r and r/m fields.


    objdump -d lists the register operand first in both AT&T and Intel syntax.

    I assembled db 0x62, 0x2c, 0x24 with NASM and linked it with ld -melf_i386 into a 32-bit ELF executable (because I have a wrapper script that makes it easier to assemble+link+disassemble than to just assemble).

       objdump -drwC -Mintel
    8048060:       62 2c 24                bound  ebp,QWORD PTR [esp]
    
       objdump -drwC -Matt
    8048060:       62 2c 24                bound  %ebp,(%esp)
    

    It does seem to be a quirk of AT&T syntax as implemented in binutils (as / objdump / gdb) that bound requires the register arg to be listed first.

    bound  %eax, (%edx)  # assembles fine
    bound (%edx), %eax   # foo.s:2: Error: operand size mismatch for `bound'
    

    I assume it's the same in Intel-syntax mode that it requires the register arg to be first. There's no ambiguity in meaning here, just an odd design choice to not reverse the operands vs. Intel syntax.


    Related: AT&T syntax also has "bugs" according to the GAS manual:

    9.15.16 AT&T Syntax bugs

    The UnixWare assembler, and probably other AT&T derived ix86 Unix assemblers, generate floating point instructions with reversed source and destination registers in certain cases. Unfortunately, gcc and possibly many other programs use this reversed syntax, so we’re stuck with it.

    For example

       fsub %st,%st(3)
    

    results in %st(3) being updated to %st - %st(3) rather than the expected %st(3) - %st. This happens with all the non-commutative arithmetic floating point operations with two register operands where the source register is %st and the destination register is %st(i).

    So AT&T syntax has actual bugs where both orders are valid and mean different things. I think we can group this operand "reversal" in with that.


    ndisasm -b32 disassembles it as 622C24 bound ebp,[esp], matching the Intel manual's operand order.