Why does a timerfd periodic Linux timer expire a little before than expected?

I am using a Linux periodic timer, in particular, timerfd, which I set to expire periodically, for instance, every 200 ms.

I noticed, however, that the timer seems to expire, sometimes, a little before with respect to the timeout I set.

In particular, I'm using the following C code to perform a simple test:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <poll.h>
#include <unistd.h>
#include <inttypes.h>
#include <sys/timerfd.h>
#include <sys/time.h>

#define NO_FLAGS_TIMER 0
#define NUM_TESTS 10

// Function to perform the difference between two struct timeval.
// The operation which is performed is out = out - in
static inline int timevalSub(struct timeval *in, struct timeval *out) {
    time_t original_out_tv_sec=out->tv_sec;

    if (out->tv_usec < in->tv_usec) {
        int nsec = (in->tv_usec - out->tv_usec) / 1000000 + 1;
        in->tv_usec -= 1000000 * nsec;
        in->tv_sec += nsec;
    }

    if (out->tv_usec - in->tv_usec > 1000000) {
        int nsec = (out->tv_usec - in->tv_usec) / 1000000;
        in->tv_usec += 1000000 * nsec;
        in->tv_sec -= nsec;
    }

    out->tv_sec-=in->tv_sec;
    out->tv_usec-=in->tv_usec;

    // '1' is returned when the result is negative
    return original_out_tv_sec < in->tv_sec;
}

// Function to create a timerfd and set it with a periodic timeout of 'time_ms', in milliseconds
int timerCreateAndSet(struct pollfd *timerMon,int *clockFd,uint64_t time_ms) {
    struct itimerspec new_value;
    time_t sec;
    long nanosec;

    // Create monotonic (increasing) timer
    *clockFd=timerfd_create(CLOCK_MONOTONIC,NO_FLAGS_TIMER);
    if(*clockFd==-1) {
        return -1;
    }

    // Convert time, in ms, to seconds and nanoseconds
    sec=(time_t) ((time_ms)/1000);
    nanosec=1000000*time_ms-sec*1000000000;
    new_value.it_value.tv_nsec=nanosec;
    new_value.it_value.tv_sec=sec;
    new_value.it_interval.tv_nsec=nanosec;
    new_value.it_interval.tv_sec=sec;

    // Fill pollfd structure
    timerMon->fd=*clockFd;
    timerMon->revents=0;
    timerMon->events=POLLIN;

    // Start timer
    if(timerfd_settime(*clockFd,NO_FLAGS_TIMER,&new_value,NULL)==-1) {
        close(*clockFd);
        return -2;
    }

    return 0;
}


int main(void) {
    struct timeval tv,tv_prev,tv_curr;
    int clockFd;
    struct pollfd timerMon;
    unsigned long long junk;

    gettimeofday(&tv,NULL);
    timerCreateAndSet(&timerMon,&clockFd,200); // 200 ms periodic expiration time

    tv_prev=tv;

    for(int a=0;a<NUM_TESTS;a++) {
        // No error check on poll() just for the sake of brevity...
        // The final code should contain a check on the return value of poll()
        poll(&timerMon,1,-1);
        (void) read(clockFd,&junk,sizeof(junk));

        gettimeofday(&tv,NULL);
        tv_curr=tv;
        if(timevalSub(&tv_prev,&tv_curr)) {
            fprintf(stdout,"Error! Negative timestamps. The test will be interrupted now.\n");
            break;
        }
        printf("Iteration: %d - curr. timestamp: %lu.%lu - elapsed after %f ms - real est. delta_t %f ms\n",a,tv.tv_sec,tv.tv_usec,200.0,
            (tv_curr.tv_sec*1000000+tv_curr.tv_usec)/1000.0);

        tv_prev=tv;
    }

    return 0;
}

After compiling it with gcc:

gcc -o timertest_stackoverflow timertest_stackoverflow.c

I'm getting the following output:

Iteration: 0 - curr. timestamp: 1583491102.833748 - elapsed after 200.000000 ms - real est. delta_t 200.112000 ms
Iteration: 1 - curr. timestamp: 1583491103.33690 - elapsed after 200.000000 ms - real est. delta_t 199.942000 ms
Iteration: 2 - curr. timestamp: 1583491103.233687 - elapsed after 200.000000 ms - real est. delta_t 199.997000 ms
Iteration: 3 - curr. timestamp: 1583491103.433737 - elapsed after 200.000000 ms - real est. delta_t 200.050000 ms
Iteration: 4 - curr. timestamp: 1583491103.633737 - elapsed after 200.000000 ms - real est. delta_t 200.000000 ms
Iteration: 5 - curr. timestamp: 1583491103.833701 - elapsed after 200.000000 ms - real est. delta_t 199.964000 ms
Iteration: 6 - curr. timestamp: 1583491104.33686 - elapsed after 200.000000 ms - real est. delta_t 199.985000 ms
Iteration: 7 - curr. timestamp: 1583491104.233745 - elapsed after 200.000000 ms - real est. delta_t 200.059000 ms
Iteration: 8 - curr. timestamp: 1583491104.433737 - elapsed after 200.000000 ms - real est. delta_t 199.992000 ms
Iteration: 9 - curr. timestamp: 1583491104.633736 - elapsed after 200.000000 ms - real est. delta_t 199.999000 ms

I would expect the real time difference, estimated with gettimeofday(), to never be less than 200 ms (also due to the time needed to clear the event with read()), but there are also some values which are a bit less than 200 ms, like 199.942000 ms.

Do you know why I'm observing this behavior?

Can it be due to the fact that I'm using gettimeofday() and, sometimes, tv_prev is taken a bit later (due to some varying delay when calling read() or gettimeofday() itself) and tv_curr, in the next iteration, is not, causing an estimated time less than 200 ms, while the timer is actually precise in expiring every 200 ms?

Thank you very much in advance.

Solution

This is related to a process scheduling. The timer is indeed quite precise and signal the timeout every 200 ms, but your program will not register the signal until it will actually get control back. That means that the time which you get from gettimeofday() call can show some later moment in the future. When you subtract such delayed values you can get result bigger or smaller than 200 ms.

How to estimate the time between the actual signal of the timer and your call to gettimeofday()? It is related to the process scheduling quantum of time. This quantum has some default value set by RR_TIMESLICE in include/linux/sched/rt.h. You can check it on your system like this:

#include <sched.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>

int main(void) {
    struct timespec tp;
    if (sched_rr_get_interval(getpid(), &tp)) {
      perror("Cannot get scheduler quantum");
    } else {
      printf("Scheduler quantum is %f ms\n", (tp.tv_sec * 1e9 + tp.tv_nsec) / 1e6); 
    }
}

Output on my system:

Scheduler quantum is 4.000000 ms

So, you may need to wait for the scheduler quantum of another process to finish before you get the control and will be able to read the current time. On my system it can lead to approximately ±4 ms deviation of the resulting delay from the expected 200 ms. After performing almost 7000 iterations I get the following distribution of the registered waiting times:

As you can see, most of the times are lying in the interval ±2 ms around the expected 200 ms. The minimum and maximum time among all iterations was 189.992 ms and 210.227 ms respectively:

~$ sort times.txt | head
189.992000
190.092000
190.720000
194.402000
195.250000
195.746000
195.847000
195.964000
196.256000
196.420000
~$ sort times.txt | tail
203.746000
203.824000
203.900000
204.026000
204.273000
205.625000
205.634000
208.974000
210.202000
210.227000
~$

Deviations bigger than 4 ms are caused by rare situations when the program needed to wait for several quantums, not just one.