Search code examples
c++operating-systemsystem-callscpu-cache

System call cost


I'm currently working on operating system operations overheads.

I'm actually studying the cost to make a system call and I've developed a simple C++ program to observe it.

#include <iostream>
#include <unistd.h>
#include <sys/time.h>

uint64_t
rdtscp(void) {
  uint32_t eax, edx;

  __asm__ __volatile__("rdtscp" //! rdtscp instruction
               : "+a" (eax), "=d" (edx) //! output
               : //! input
               : "%ecx"); //! registers

  return (((uint64_t)edx << 32) | eax);
}

int main(void) {
  uint64_t before;
  uint64_t after;
  struct timeval t;
  for (unsigned int i = 0; i < 10; ++i) {
    before = rdtscp();
    gettimeofday(&t, NULL);
    after = rdtscp();
    std::cout << after - before  << std::endl;
    std::cout << t.tv_usec << std::endl;
  }

  return 0;
}

This program is quite straightforward.

  • The rdtscp function is just a wrapper to call the RTDSCP instruction (a processor instruction which loads the 64-bits cycle count into two 32-bits registers). This function is used to take the timing.
  • I iterate 10 times. At each iteration I call gettimeofday and determine the take it took to execute it (as a number of CPU cycles).

The results are quite unexpected:

8984
64008
376
64049
164
64053
160
64056
160
64060
160
64063
164
64067
160
64070
160
64073
160
64077

Odd lines in the output are the number of cycles needed to execute the system call. Even lines are the value contains in t.tv_usec (which is set by gettimeofday, the system call that I'm studying).

I don't really understand how that it is possible: the number of cycles drastically decreases, from nearly 10,000 to around 150! But the timeval struct is still updated at each call!

I've tried on different operating system (debian and macos) and the result is similar.

Even if the cache is used, I don't see how it is possible. Making a system call should result in a context switch to switch from user to kernel mode and we still need to read the clock in order to update the time.

Does someone has an idea?


Solution

  • The answer ? try another system call. There's vsyscalls on linux, and they accelerate things for certain syscalls: What are vdso and vsyscall?

    The short version: the syscall is not performed, but instead the kernel maps a region of memory where the process can access the time information. Cost ? Not much (no context switch).