Mysterious memory leaks in preloader-based lock tracing tool

I'm working on a lock tracing tool designed to be attached to Pthreads-based applications using LD_PRELOAD, and I've encountered a weird issue. When a test application is run under valgrind with my tracer attached, it reports several memory leaks originating in libpthread's pthread_cond_signal()/wait() (my tool shadows these functions to implement the tracing functionality). These leaks do not occur when my tool is not attached. Sample leak report:

==12993== 48 bytes in 1 blocks are definitely lost in loss record 1 of 6                       
==12993==    at 0x483DD99: calloc (vg_replace_malloc.c:762)                                    
==12993==    by 0x48C8629: pthread_cond_wait@GLIBC_2.2.5 (old_pthread_cond_wait.c:34)          
==12993==    by 0x48775EF: pthread_cond_wait (pthread_trace.cpp:39)                            
==12993==    by 0x10C060: shard_get (shard.c:68)                                               
==12993==    by 0x10BC38: resolver_thread (req_res.c:74)                                                                                                                                       
==12993==    by 0x487789A: inject_thread_registration(void*) (pthread_trace.cpp:85)                                                                                                            
==12993==    by 0x48C0608: start_thread (pthread_create.c:477)                                 
==12993==    by 0x49FC292: clone (clone.S:95)

I have no idea why this is happening, because my code doesn't interact with the Pthreads objects at all, beyond taking their address for logging. Here's the code for my wrapper functions:

int pthread_cond_wait(pthread_cond_t* cond, pthread_mutex_t* lk) {
        // log arrival at wait
        the_tracer.add_event(lktrace::event::COND_WAIT, (size_t) cond);
        // run pthreads function
        GET_REAL_FN(pthread_cond_wait, int, pthread_cond_t*, pthread_mutex_t*);
        int e = REAL_FN(cond, lk);
        if (e == 0) the_tracer.add_event(lktrace::event::COND_LEAVE, (size_t) cond);
        else the_tracer.add_event(lktrace::event::COND_ERR, (size_t) cond);
        return e;
}

int pthread_cond_signal(pthread_cond_t* cond) {
        // log cond signal
        the_tracer.add_event(lktrace::event::COND_SIGNAL, (size_t) cond);
        // run pthreads function
        GET_REAL_FN(pthread_cond_signal, int, pthread_cond_t*);
        return REAL_FN(cond);
}

// GET_REAL_FN definition:
#define GET_REAL_FN(name, rtn, params...) \
        typedef rtn (*real_fn_t)(params); \
        static const real_fn_t REAL_FN = (real_fn_t) dlsym(RTLD_NEXT, #name); \
        assert(REAL_FN != NULL) // semicolon absence intentional

And, for completeness, here's the relevant glibc code for pthread_cond_signal (identical for pthread_cond_wait, except for the return function call):

int
__pthread_cond_signal_2_0 (pthread_cond_2_0_t *cond)
{
  if (cond->cond == NULL)
    {
      pthread_cond_t *newcond;

      newcond = (pthread_cond_t *) calloc (sizeof (pthread_cond_t), 1); // leak alloc'd here
      if (newcond == NULL)
        return ENOMEM;

      if (atomic_compare_and_exchange_bool_acq (&cond->cond, newcond, NULL))
        /* Somebody else just initialized the condvar.  */
        free (newcond);
    }

  return __pthread_cond_signal (cond->cond);
}

The test program (that my tool attaches to) does clean up its condvars before exiting (and as mentioned, has no memory leaks when run without the tool). I'm pretty mystified by this, y'all got any ideas? I'm sure it's something simple that's staring me in the face, it always is...

Solution

I bet you that if you:

Change the name of your function to something else than pthread_cond_wait, say pthread_cond_wait_my,
Create a small test snippet that invokes the _my variant,
Create a minimal shared library that has a dummy implementation of pthread_cond_wait_my, and link the snippet with it (just as it would link with libpthread),
Run that snippet, with your tracing library brought in via LD_PRELOAD, just as before,

... there'll be no leak reported, even though your library will still do exactly the same, just under a different name :)

If that's the case, then the leak is "new" to you, but is not actually new: it is a genuine leak in the pthreads library that's normally suppressed from diagnostic output. Valgrind comes with a whole bunch of suppressions for runtime libraries - it'd be crazily noisy otherwise. But your tool provides the pthread_cond_wait symbol, and Valgrind thus incorrectly applies the suppressions to your function, instead of the one it meant to (in the runtime library).