Search code examples
multithreadingvalgrind

valgrind stalls in multithreaded socket program


I'm running a multithreaded socket program with valgrind. The client will send out a request to the server over TCP, and then busy wait on a boolean. The boolean will be set when the callback function which services the response from the server is called. Once the response is received (and the boolean flag is set), the server will again send out a request, and do this repeatedly in a loop.

I realise that unsychronised access to shared variables (the boolean) can cause threading issues, but I've tried using pthread mutexes, and the program slows down by about 20% (speed is of importance here). I'm confident that writing to the shared boolean variable is fine as it can be done in a single cycle.

The program runs fine outside of valgrind, but will often stall when run with valgrind. I left the program to run overnight.. usually it takes a few seconds to complete, so I don't think it's a case of not waiting long enough for the program to finish. The threading is managed by the open source engine framework (quick fix), so I don't think it's a problem with how the threads are created/managed.

Does anyone know of any problems with valgrind around multi threaded programs/busy wait loops/socket communications (or a combination of these)?


Solution

  • While other answers focus on insisting that you take the standard synchronization approach (something I fully agree with), I thought instead I should answer your question regarding Valgrind.

    As far as I know there are no issues with Valgrind running in multi-threaded environment. I believe Valgrind forces the application to run on a single core, but other than that it should not affect your threads.

    What Valgrind is probably doing to your application is altering the timings and interactions between your threads in ways that might be exposing bugs and race conditions in your code that you don't normally see while running stand-alone.

    The same logic you applied to decide that the bug could not be in the open source threading framework you are using also applies to Valgrind in my opinion. I recommend that you consider these hangs as bugs in your code and debug them as such, because that is most likely what they are.

    As a side note, using a mutex is probably overkill for the problem you described. You should investigate semaphores or condition variables instead.

    Good luck.