Search code examples
linuxpthreadsglibc

pthread_cond_timedwait returns one second early


The program below produces this output:

$ ./test_condvar 9000
1343868189.623067126 1343868198.623067126 FIRST
1343868197.623132345 1343868206.623132345 TIMEOUT
1343868205.623190120 1343868214.623190120 TIMEOUT
1343868213.623248184 1343868222.623248184 TIMEOUT
1343868221.623311549 1343868230.623311549 TIMEOUT
1343868229.623369718 1343868238.623369718 TIMEOUT
1343868237.623428856 1343868246.623428856 TIMEOUT

Note that reading across rows shows a time delta of the intended 9 seconds, but reading down columns show that pthread_cond_timedwait returns ETIMEDOUT in 8 seconds.

pthread lib is glibc 2.12. running Red Hat EL6. uname -a shows 2.6.32-131.12.1.el6.x86_64 #1 SMP Tue Aug 23 11:13:45 CDT 2011 x86_64 x86_64 x86_64 GNU/Linux

it looks like pthread_cond_timedwait relies on lll_futex_timed_wait for the timeout behavior.

Any ideas on where else to search for an explanation?

#include <time.h>
#include <sys/time.h>
#include <pthread.h>
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>

int main ( int argc, char *argv[] )
{
    pthread_mutexattr_t mtx_attr;
    pthread_mutex_t mtx;
    pthread_condattr_t cond_attr;
    pthread_cond_t cond;

    int milliseconds;
    const char *res = "FIRST";

    if ( argc < 2 )
    {
        fputs ( "must specify interval in milliseconds", stderr );
        exit ( EXIT_FAILURE );
    }

    milliseconds = atoi ( argv[1] );

    pthread_mutexattr_init ( &mtx_attr );
    pthread_mutexattr_settype ( &mtx_attr, PTHREAD_MUTEX_NORMAL );
    pthread_mutexattr_setpshared ( &mtx_attr, PTHREAD_PROCESS_PRIVATE );

    pthread_mutex_init ( &mtx, &mtx_attr );
    pthread_mutexattr_destroy ( &mtx_attr );

#ifdef USE_CONDATTR
    pthread_condattr_init ( &cond_attr );
    if ( pthread_condattr_setclock ( &cond_attr, CLOCK_REALTIME ) != 0 )
    {
        fputs ( "pthread_condattr_setclock failed", stderr );
        exit ( EXIT_FAILURE );
    }

    pthread_cond_init ( &cond, &cond_attr );
    pthread_condattr_destroy ( &cond_attr );
#else
    pthread_cond_init ( &cond, NULL );
#endif

    for (;;)
    {
        struct timespec now, ts;
            clock_gettime ( CLOCK_REALTIME, &now );

        ts.tv_sec = now.tv_sec + milliseconds / 1000;
            ts.tv_nsec = now.tv_nsec + (milliseconds % 1000) * 1000000;
        if (ts.tv_nsec > 1000000000)
        {
            ts.tv_nsec -= 1000000000;
            ++ts.tv_sec;
        }

        printf ( "%ld.%09ld %ld.%09ld %s\n", now.tv_sec, now.tv_nsec,
                 ts.tv_sec, ts.tv_nsec, res );

        pthread_mutex_lock ( &mtx );
        if ( pthread_cond_timedwait ( &cond, &mtx, &ts ) == ETIMEDOUT )
            res = "TIMEOUT";
        else
            res = "OTHER";
        pthread_mutex_unlock ( &mtx );
    }
}

Solution

  • There was a Linux kernel bug triggered by the insertion of a leap second on July 1st this year, which resulted in futexes expiring one second too early until either the machine was rebooted or you ran the workaround:

    # date -s "`date`"
    

    It sounds like you've been bitten by that.