Search code examples
linuxoperating-systemsystem-callscpu-architecture

Are Linux system calls executed inside an exception handler?


I understand that after entering a system call with e.g. syscall, int 0x80 (x86/x86-64) or svc (ARM) instruction, we stay in the calling process context (but switch from user to kernel mode) from Linux kernel point of view. However, from hardware point of view, we jump into a syscall/svc/... exception handler. Is the whole system call code executed inside the exception handler in Linux?


Solution

  • Using the terminology that's common for 80x86 (from Intel's manuals, etc); the CPU has a "current privilege level" (CPL) that determines if code is restricted or not (e.g. if privileged instructions are permitted or not), and this is the basis of "user-space vs. kernel space". The things that trigger a switch from CPL=3 ("user space") to CPL=0 ("kernel space") are:

    • exceptions, which typically indicate that a problem (e.g. division by zero) was detected by the CPU

    • IRQs, which indicate that a device needs attention

    • software interrupts, call gates, and the syscall and sysenter instructions. These are all different ways for software to explicitly ask the OS/kernel for something (kernel system calls) where different operating systems/kernels may only support some or one of them (64-bit code will only need syscall and all the other alternatives probably won't be supported by the OS/kernel unless it's trying to provide backward compatibility for obsolete 32-bit stuff).

    • Task gates (obsolete, not supported for 64-bit and not used by any well known 32-bit OS).

    Using this terminology; it'd be wrong to say that Linux system calls are executed in an exception handler (because an exception is something specific that isn't involved).

    However...

    Different people define terminology differently; and some people (ARM) define "exception" as a synonym for "anything that causes a switch to kernel space". This makes some sense for CPU designers who are primarily focused on the impact on the CPU that any switch to supervisor mode has and have little reason to care about the differences (because the differences are mostly a software developer's problem). For everyone else (software developers) by using that terminology you could say that everything in the kernel is used inside an exception handler; which mostly makes the word "exception" meaningless (because "could be anything at all" doesn't provide any additional information). In other words, using that terminology, "Linux system calls are executed inside an exception handler" is technically correct but could be shortened to "Linux system calls are executed" without changing the statement's meaning.

    Note: Recently Intel published a draft proposal for a possible future extension that would (if adopted and supported by CPU and enabled by the OS) replace all of the above with a new "events" scheme; where many different/separate (exception, IRQ, system calls, ...) handlers are replaced by a single "event handler" (which would have to fetch an "event reason" provided by CPU and then branch to "event reason specific" code). If that happens I'd expect a third set of terminology (e.g. "exception event" and "IRQ event" and "system call event", where all of kernel's code is executed in the context of some kind of event; and where "Linux system calls are executed inside an event handler" would be technically correct but could be shortened to "Linux system calls are executed").