Search code examples
c++linuxfutex

How do I find the line of C++ which locks a Linux futex?


I've got a performance problem with a large application written in C++. The program uses only 150% CPU, while the server is a 24-core hyperthreaded EPYC and other, similar applications can reliably hit the expected 4800% CPU load. iotop shows virtually no I/O, which is expected.

As the program is apparently neither I/O-bound nor CPU-bound, I checked strace and found that the vast majority of traced calls are waits on a single futex. That is to say: 48 of the 50 threads in the program appear to lock the same futex, which explains quite well why the CPU load only barely exceeds 100%.

Example:

[pid 11581] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11580] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11579] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11578] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11577] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11576] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>

Now the problem for me is: how do I find the offending code? The program is not deadlocks, just slow, so the usual techniques to find deadlocks do not work.


Solution

  • The best way I found myself was to run the program in GDB. Since most threads are blocked, info threads will show most of the threads in the same state. For me, that happened to be blocked in __lll_lock_wait. Switching to any of these threads gave me a stacktrace showing how I ended up in __lll_lock_wait. Three levels up the stack I found my offending code.