linux linux-kernel kernel scheduler affinity

Is CPU affinity enforced across system calls?

So if I set a process's CPU affinity using:

sched_setaffinity()

and then perform some other system call using that process, is that system call ALSO guaranteed to execute on the same CPU enforced by sched_setaffinity?

Essentially, I'm trying to enforce that a process, and the system calls it makes, are executed on the same core. Obviously I can use sched_setaffinity() to enforce userspace code will execute on only one CPU, but does that same system call enforce kernel-space code in that process context will execute on the same core as well?

Thanks!

Solution

Syscalls are really just your process code switching from user to kernel mode. The task that is being run does not change at all, it just temporarily enters kernel mode to execute the syscall and then returns back to user mode.

A task can be preempted by the scheduler and moved to a different CPU, and this can happen in the middle of normal user mode code or even in the middle of a syscall.

By setting the task affinity to a single CPU using sched_setaffinity(), you remove this possibility, since even if the task gets preempted, the scheduler has no choice but to keep it running on the same CPU (it may of course change the currently running task, but when your task resumes it will still be on the same CPU).

So to answer your question:

does that same system call enforce kernel-space code in that process context will execute on the same core as well?

Yes, it does.

Now, to address @Barmar's comment: in the case of syscalls that can "sleep", this does not mean that the task could change CPU if the affinity does not allow it.

What happens when a syscall sleeps, is simply that the syscall code tells the scheduler: "hey, I'm waiting for something, just run another task while I wait and wake me up later". When the syscall resumes, it checks if the requested resource is available (it could even tell the kernel exactly when it wants to be waken up), and if not it either waits again or returns to user code saying "sorry, I got nothing, try again". The resource could of course be made available by some interrupt that causes an interrupt handler to run on a different CPU, but that's a different story, and it doesn't really matter. To put it simply: interrupt code does not run in process context, at all. For what the task executing the syscall is concerned, the resource is just magically there when execution resumes.