Search code examples
macosdebuggingbreakpointsclang++lldb

Why breakpoint location is +4 bytes to the function address?


Say I have this main.cpp:

#include <iostream>

int foo() {
  return 123;
}

int main() {
  int a = 1234;
  std::cout << foo() + a << std::endl;
  return 0;
}

Then I compile it:

clang++ -g -O0 main.cpp

Now I have this a.out, which I can look at the symbol table via dsymutil -s a.out:

...
[    11] 00000001 2e (N_BNSYM      ) 01     0000   0000000100003ce0
[    12] 00000002 24 (N_FUN        ) 01     0000   0000000100003ce0 '__Z3foov'
[    13] 00000001 24 (N_FUN        ) 00     0000   0000000000000008
[    14] 00000001 4e (N_ENSYM      ) 01     0000   0000000100003ce0
...

So, this foo function is at 3ce0.

Now, use LLDB to set a breakpoint at this function:

% lldb a.out
(lldb) target create "a.out"
Current executable set to '/Users/royshi/tmp/a.out' (arm64).
(lldb) b foo
Breakpoint 1: where = a.out`foo() + 4 at main.cpp:4:3, address = 0x0000000100003ce4
(lldb)

Now, notice the breakpoint is at (0x000000010000)3ce4, and it reads "where = a.out`foo() + 4 at main.cpp:4:3", which agrees with the 3ce4 address.

So apparently the breakpoint is 4 bytes beyond foo's address (3ce0).

I wonder why the 4-byte difference?


EDIT:

Following the above, I then tried to remove the dSYM folder (by moving it away), then again setting the breakpoint, it ended up being at foo's address (i.e. no extra 4 bytes). So, it seems to be related to the debug symbols in some way.

# moving a.out.dSYM to somewhere unaccessible
% mv a.out.dSYM tmp2

% lldb a.out
(lldb) target create "a.out"
Current executable set to '/Users/royshi/tmp/a.out' (arm64).
(lldb) b foo
Breakpoint 1: where = a.out`foo(), address = 0x0000000100003ce0
(lldb) ^D

Solution

  • jasonharper got the reason for doing this right. If you stop at the very start of a function, before the prologue is run and the new stack frame set up, then the stack unwind might not look right, and variable values which are given in terms of the stack after the prologue will be all off. That can be disconcerting if you don't know what's going on. So by default, when you set a breakpoint on a function symbol, lldb moves it forward to the end of the function prologue.

    As for why debug information might affect how lldb determines where the end of the prologue is: it turns out the DWARF line table from the debug info has an "end of prologue" marker which lldb will happily use. Whereas, if you don't have debug info, lldb falls back to scanning the instructions from the start of the function and matching them to "known prologue patterns". The latter is more like educated guessing, whereas the compiler can know this precisely. So the two computations do sometimes differ.

    BTW, if you don't want this behavior, then do:

    (lldb) break set -n <my_function> --skip-prologue 0