Is there a way to get valgrind to use multiple processors?
I'm doing some bottleneck profiling with valgrind's callgrind and noticed significantly different resource usage behavior in my application vs when run outside of valgrind/callgrind.
When run outside valgrind, it maxes out several processors, but run inside valgrind only uses one. This makes me worry that my bottle necks will be in different places, and thus invalidate my profiling.
According to the Valgrind Docs, they do not support multiple processors:
The main thing to point out with respect to threaded programs is that your program will use the native threading library, but Valgrind serialises execution so that only one (kernel) thread is running at a time. This approach avoids the horrible implementation problems of implementing a truly multithreaded version of Valgrind, but it does mean that threaded apps run only on one CPU, even if you have a multiprocessor or multicore machine.
Valgrind doesn't schedule the threads itself. It merely ensures that only one thread runs at once, using a simple locking scheme. The actual thread scheduling remains under control of the OS kernel. What this does mean, though, is that your program will see very different scheduling when run on Valgrind than it does when running normally. This is both because Valgrind is serialising the threads, and because the code runs so much slower than normal.
This difference in scheduling may cause your program to behave differently, if you have some kind of concurrency, critical race, locking, or similar, bugs. In that case you might consider using the tools Helgrind and/or DRD to track them down.