Search code examples
pythonlinuxsystem-callsstracevdso

How does python determine the PID seemingly without normal system calls in Linux?


When running the following command

strace -f python3 -c 'import os; print(os.getpid())'

I noticed that strace does not catch the call to the getpid(2) system call. I first considered that this was due to glibc caching the pid, but there shouldn't be a pid for libc to cache without at least a single real system call. Then I considered that maybe vdso was the culprit, but running a C program that makes this system call through libc shows a getpid call when straced. I finally gave up and looked up the source of the os.getpid python module, which apparently seems to be defined in Modules/posixmodule.c. To my surprise (and subsequent confusion), it makes a normal call to getpid!

So my question is: How does python determine the result of os.getpid? and if such value is indeed obtained by a call to getpid, how is that call actually being made?


Solution

  • The way the vdso works is, among other things, mapping process-specific variables into userspace that the vdso functions know how to read. One of them is the current process ID, so gettimeofday doesn't need to make a syscall to access that information.

    Now, specifically for getpid, it's not actually a VDSO call. In glibc before 2.25, the library would cache calls, and since part of the Python runtime calls getpid, there wouldn't be calls to it after the first. From 2.25 onward, the library doesn't cache the process ID, and so every getpid call results in a syscall.