Search code examples
cgdbsystemtap

How can I set breakpoint in GDB for open(2) syscall returning -1


OS: GNU/Linux
Distro: OpenSuSe 13.1
Arch: x86-64
GDB version: 7.6.50.20130731-cvs
Program language: mostly C with minor bits of assembly

Imagine that I've got rather big program that sometimes fails to open a file. Is it possible to set breakpoint in GDB in such way that it stops after open(2) syscall returns -1?

Of course, I can grep through the source code and find all open(2) invocations and narrow down the faulting open() call but maybe there's a better way.

I tried to use "catch syscall open" then "condition N if $rax==-1" but obviously it didn't get hit.
BTW, Is it possible to distinct between a call to syscall (e.g. open(2)) and return from syscall (e.g. open(2)) in GDB?

As a current workaround I do the following:

  1. Run the program in question under the GDB
  2. From another terminal launch systemtap script:

    stap -g -v -e 'probe process("PATH to the program run under GDB").syscall.return { if( $syscall == 2 && $return <0) raise(%{ SIGSTOP %}) }'
    
  3. After open(2) returns -1 I receive SIGSTOP in GDB session and I can debug the issue.

TIA.

Best regards,
alexz.

UPD: Even though I tried the approach suggested by n.m before and wasn't able to make it work I decided to give it another try. After 2 hours it now works as intended. But with some weird workaround:

  1. I still can't distinct between call and return from syscall
  2. If I use finish in comm I can't use continue, which is OK according to GDB docs
    i.e. the following does drop to gdb prompt on each break:

    gdb> comm
    gdb> finish
    gdb> printf "rax is %d\n",$rax
    gdb> cont
    gdb> end
    
  3. Actually I can avoid using finish and check %rax in commands but in this case I have to check for -errno rather than -1 e.g. if it's "Permission denied" then I have to check for "-13" and if it's "No such file or direcory" - then for -2. It's just simply not right

  4. So the only way to make it work for me was to define custom function and use it in the following way:

    (gdb) catch syscall open
    Catchpoint 1 (syscall 'open' [2]
    (gdb) define mycheck
    Type commands for definition of "mycheck".
    End with a line saying just "end".
    >finish
    >finish
    >if ($rax != -1)
     >cont
     >end
    >printf "rax is %d\n",$rax
    >end
    (gdb) comm
    Type commands for breakpoint(s) 1, one per line.
    End with a line saying just "end".
    >mycheck
    >end
    (gdb) r
    The program being debugged has been started already.
    Start it from the beginning? (y or n) y
    Starting program: /home/alexz/gdb_syscall_test/main
    .....
    Catchpoint 1 (returned from syscall open), 0x00007ffff7b093f0 in __open_nocancel () from /lib64/libc.so.6
    0x0000000000400756 in main (argc=1, argv=0x7fffffffdb18) at main.c:24
    24                      fd = open(filenames[i], O_RDONLY);
    Opening test1
    fd = 3 (0x3)
    Successfully opened test1
    
    Catchpoint 1 (call to syscall open), 0x00007ffff7b093f0 in __open_nocancel () from /lib64/libc.so.6
    rax is -38
    
    Catchpoint 1 (returned from syscall open), 0x00007ffff7b093f0 in __open_nocancel () from /lib64/libc.so.6
    0x0000000000400756 in main (argc=1, argv=0x7fffffffdb18) at main.c:24
    ---Type <return> to continue, or q <return> to quit---
    24                      fd = open(filenames[i], O_RDONLY);
    rax is -1
    (gdb) bt
    #0  0x0000000000400756 in main (argc=1, argv=0x7fffffffdb18) at main.c:24
    (gdb) step
    26                      printf("Opening %s\n", filenames[i]);
    (gdb) info locals
    i = 1
    fd = -1
    

Solution

  • Is it possible to set breakpoint in GDB in such way that it stops after open(2) syscall returns -1?

    It's hard to do better than n.m.s answer for this narrow question, but I would argue that the question is posed incorrectly.

    Of course, I can grep through the source code and find all open(2) invocations

    That is part of your confusion: when you call open in a C program, you are not in fact executing open(2) system call. Rather, you are invoking an open(3) "stub" from your libc, and that stub will execute the open(2) system call for you.

    And if you want to set a breakpoint when the stub is about to return -1, that is very easy.

    Example:

    /* t.c */
    #include <sys/stat.h>
    #include <fcntl.h>
    
    int main()
    {
      int fd = open("/no/such/file", O_RDONLY);
      return fd == -1 ? 0 : 1;
    }
    
    $ gcc -g t.c; gdb -q ./a.out
    (gdb) start
    Temporary breakpoint 1 at 0x4004fc: file t.c, line 6.
    Starting program: /tmp/a.out
    
    Temporary breakpoint 1, main () at t.c:6
    6     int fd = open("/no/such/file", O_RDONLY);
    (gdb) s
    open64 () at ../sysdeps/unix/syscall-template.S:82
    82  ../sysdeps/unix/syscall-template.S: No such file or directory.
    

    Here we've reached the glibc system call stub. Let's disassemble it:

    (gdb) disas
    Dump of assembler code for function open64:
    => 0x00007ffff7b01d00 <+0>: cmpl   $0x0,0x2d74ad(%rip)        # 0x7ffff7dd91b4 <__libc_multiple_threads>
       0x00007ffff7b01d07 <+7>: jne    0x7ffff7b01d19 <open64+25>
       0x00007ffff7b01d09 <+0>: mov    $0x2,%eax
       0x00007ffff7b01d0e <+5>: syscall
       0x00007ffff7b01d10 <+7>: cmp    $0xfffffffffffff001,%rax
       0x00007ffff7b01d16 <+13>:    jae    0x7ffff7b01d49 <open64+73>
       0x00007ffff7b01d18 <+15>:    retq
       0x00007ffff7b01d19 <+25>:    sub    $0x8,%rsp
       0x00007ffff7b01d1d <+29>:    callq  0x7ffff7b1d050 <__libc_enable_asynccancel>
       0x00007ffff7b01d22 <+34>:    mov    %rax,(%rsp)
       0x00007ffff7b01d26 <+38>:    mov    $0x2,%eax
       0x00007ffff7b01d2b <+43>:    syscall
       0x00007ffff7b01d2d <+45>:    mov    (%rsp),%rdi
       0x00007ffff7b01d31 <+49>:    mov    %rax,%rdx
       0x00007ffff7b01d34 <+52>:    callq  0x7ffff7b1d0b0 <__libc_disable_asynccancel>
       0x00007ffff7b01d39 <+57>:    mov    %rdx,%rax
       0x00007ffff7b01d3c <+60>:    add    $0x8,%rsp
       0x00007ffff7b01d40 <+64>:    cmp    $0xfffffffffffff001,%rax
       0x00007ffff7b01d46 <+70>:    jae    0x7ffff7b01d49 <open64+73>
       0x00007ffff7b01d48 <+72>:    retq
       0x00007ffff7b01d49 <+73>:    mov    0x2d10d0(%rip),%rcx        # 0x7ffff7dd2e20
       0x00007ffff7b01d50 <+80>:    xor    %edx,%edx
       0x00007ffff7b01d52 <+82>:    sub    %rax,%rdx
       0x00007ffff7b01d55 <+85>:    mov    %edx,%fs:(%rcx)
       0x00007ffff7b01d58 <+88>:    or     $0xffffffffffffffff,%rax
       0x00007ffff7b01d5c <+92>:    jmp    0x7ffff7b01d48 <open64+72>
    End of assembler dump.
    

    Here you can see that the stub behaves differently depending on whether the program has multiple threads or not. This has to do with asynchronous cancellation.

    There are two syscall instructions, and in the general case we'd need to set a breakpoint after each one (but see below).

    But this example is single-threaded, so I can set a single conditional breakpoint:

    (gdb) b *0x00007ffff7b01d10 if $rax < 0
    Breakpoint 2 at 0x7ffff7b01d10: file ../sysdeps/unix/syscall-template.S, line 82.
    (gdb) c
    Continuing.
    
    Breakpoint 2, 0x00007ffff7b01d10 in __open_nocancel () at ../sysdeps/unix/syscall-template.S:82
    82  in ../sysdeps/unix/syscall-template.S
    (gdb) p $rax
    $1 = -2
    

    Voila, the open(2) system call returned -2, which the stub will translate into setting errno to ENOENT (which is 2 on this system) and returning -1.

    If the open(2) succeeded, the condition $rax < 0 would be false, and GDB will keep going.

    That is precisely the behavior one usually wants from GDB when looking for one failing system call among many succeeding ones.

    Update:

    As Chris Dodd points out, there are two syscalls, but on error they both branch to the same error-handling code (the code that sets errno). Thus, we can set an un-conditional breakpoint on *0x00007ffff7b01d49, and that breakpoint will fire only on failure.

    This is much better, because conditional breakpoints slow down execution quite a lot when the condition is false (GDB has to stop the inferior, evaluate the condition, and resume the inferior if the condition is false).