Search code examples
assemblygdbx86-64yasm

Setting breakpoints in GDB on a program build with YASM -g dwarf2 changes program behaviour and segfaults or SIGILL


I'm trying to work out if a string is a palindrome or not in Assembly. Essentially I attempt to copy the bytes backwards from string 'my_string' to 'tmp_str'. Then I attempt to compare both strings use repe and cmpsb.

The problem I'm facing is that the program randomly hangs or throws SIGILL's OR SIGSEV's when setting breakpoints in GDB. The behaviour is truly random, or random enough that I can't the source of the error to a single line. I am incredibly confused and would appreciate some insight in what may be causing these errors. Note, if I set no break points or some others such as line 25, 27, 28, it works fine.

I have checked, and the for loop does iterate the correct number of times, with the right values. The initial strings are also in the format I expect.

Example errors:

Hangs <- if break point is set to line 30

Program received signal SIGILL, Illegal instruction. _start.for_1.end () at 4.asm:32 32 jmp .for_1.start <- when break point was set to line 28 (bug isn't reproducible but it did happen)

Program received signal SIGSEGV, Segmentation fault. 0x0000000000400feb in ?? () <- when break point was set to line 31

;; Write an assembly program to determine if a string stored in memory is a palindrome
;; (A palindrome is a string which is the same after being reversed, like "refer")
;; Use at least one repeat instruction

                segment .data
my_string       db              "bob", 0

                segment .bss
tmp_str         resb            4

                segment .text
                global _start

_start:
                ;/* Copy initial string BACKWARDS */
                xor             rcx, rcx                        ; have to use manual for loop
.for_1.start:
                cmp             ecx, 4
                je              .for_1.end
                mov             al, byte [my_string+ecx]        ; get character from original string from i'th place
                mov             rbx, tmp_str+2                  ; go to end of tmp_str (writing my_string to tmp_str backwards)
                sub             rbx, rcx                        ; we can't minus from address expression thingy, so just deduct from register stored above
                mov             [rbx], byte al                  ; copy byte in ebx into mem address stored atueax
                inc             ecx                             ; increment counter     
                jmp             .for_1.start                    ; unconditional start to stop (for loops check and do conditional jumps at top)
.for_1.end:
                ;/* Compare strings */
                mov             rsi, my_string                  ; now we want to compare strings (remember, pointer in reg moves in movsb)
                mov             rdi, tmp_str                    ; rdi, set to tmp_str address 
                mov             rcx, 2                          ; since rcx was modified by rep, we need to reset that though
                repe            cmpsb                           ; compare strings whilst each val (byte) is equal
                                                                ; now, if rcx is NOT 0, the bytes do not match and is not a palindrome!
                ;/* EOP */
                mov             eax, 1
                xor             ebx, ebx
                int 0x80

If anybody could offer some advice I would be greatly appreciative.

Compile commands: yasm -f elf64 -g dwarf2 <file>.asm ; ld <file>.o -o <file>


Solution

  • YASM is making bad DWARF2 debug info. It's old and unmaintained. Use NASM instead.

    NASM 2.15.05 nasm -felf64 -g didn't work for me either: GDB 12.1 says there is no line 30 when I tried b 30. But still generally use NASM. I didn't try NASM's DWARF debug-info format; I've had problems with it in the past IIRC, like messing up objdump disassembly so it's probably not great.

    Don't rely on debug info from NASM or YASM. Use layout asm and set breakpoints on numeric addresses, or at the current position that you single-step to. layout reg / layout n is a good way to a registers + disassembly view. You can copy/paste addresses from there or disas to do stuff like b *0x40101b. Start the program with starti so GDB stops before executing the first user-space instruction; from there you can si single-step by instruction. See the bottom of https://stackoverflow.com/tags/x86/info for asm debugging tips.

    (Update: the NASM bug with debug info is described in GDB does not load source lines from NASM Will hopefully get fixed in a future version of NASM.)

    Assembly language maps 1:1 with machine code, so it's actually helpful to look at canonical disassembly of it when debugging, may help you spot something where you wrote the wrong thing by accident.


    When I build with YASM 1.3.0 and try single-stepping this with layout reg (so register + source view), the debug info doesn't seem to match well, since I get two steps on the same source line sometimes (other than the repe where that's expected; I mean with RIP incrementing).

    I built with yasm -felf64 -gdwarf2 using YASM 1.3.0, ld 2.38, GDB 12.1, on Linux 5.18 (Arch Linux) on bare metal (Skylake CPU). Using b 30 to set a breakpoint there doesn't ever hit the breakpoint; it runs without crashing for me.


    Debug info maps source lines to memory addresses, so setting a breakpoint in GDB modifies a byte of machine code other than the first of an instruction (to 0xcc INT3 software breakpoint).

    This would lead to occasional illegal instructions, or more commonly to valid but different instructions (e.g. changing a byte of an absolute address), perhaps of shorter length leading to later bytes getting decoded as opcodes if a ModRM byte got modified. (Linux delivers SIGSEGV when user-space tries to run a privileged instruction, so various problems would all raise the same signal, even if the CPU exception was #GP rather than #PF). Also, overwriting a ModRM byte with 0xCC would change what the register operands are, so later instructions could use a bad register value.

    0xCC as a ModRM byte is a register (not memory) operand with AH and CL or ESP and ECX. For example with the first 4 opcodes (add of different order and size), from putting db 0, 0xcc and so on into a .asm to make this example:

      401000:       00 cc                   add    ah,cl
      401002:       01 cc                   add    esp,ecx
      401004:       02 cc                   add    cl,ah
      401006:       03 cc                   add    ecx,esp
    

    Imagine what would happen if any of your mov or sub instructions had their operands replaced with esp,ecx for example! (And if it happens to inc, it could actually change the instruction. Some of your mov-immediate instructions may have modrm bytes, too, since unlike NASM, YASM doesn't optimize mov rcx,2 to mov ecx,2; it uses mov r/m64, sign_extended_imm32.)

    Or of course messing up the jmp rel8 would jump to the wrong place. (But CC is a negative 8-bit integer so it would jump backwards).

    Using GDB to try to examine the situation (e.g. to disassemble the machine code that faulted) may not work, because GDB puts back the original machine code bytes for commands like x /i or disas to try to disassemble.

    You might still see RIP pointing at a privileged instruction or a bad register value if an earlier 0xCC byte got decoding out of sync, but you wouldn't be able to see how execution of earlier instructions could have led to this point, because you wouldn't be seeing the earlier instructions that the CPU actually executed.


    I can reproduce this, confirming stray 0xCC bytes

    I was able to reproduce it by setting breakpoints on lines 29 and 31 as well as 30. When it segfaulted, RIP was 0x40103f, just past the end of the int 0x80.

    Watching the disassembly view and single-stepping by instruction with si (stepi), execution went right through the repe cmpsb in one step, and through the int 0x80, faulting on a 00 00 add [rax],al after it.

    mov rdi,0x402004 loaded 0x4020cc, the wrong address but still inside the same page. So the strings differed on the first byte, explaining why repe cmpsb ran only one instruction.

    mov eax,0x1 loaded RAX with 0xcc. In the 32-bit int 0x80 ABI (which you normally don't want to use in 64-bit code, BTW) that's __NR_setregid32 (check asm/unistd_32.h). So int 0x80 returns with RAX=-1, -EPERM (asm-generic/errno-base.h).

    In both these cases, a 0xcc byte was the 2nd byte of the instruction, the first byte of the immediate. x86 is little-endian, so that messed up the low byte of the value loaded.