Search code examples
linuxx86kerneltlb

Who performs the TLB shootdown?


I read this SO question describing what a TLB shootdown is. I'm trying to understand if this is an operation performed by the kernel or by the processor or both?

My questions are :-

  1. Does a TLB shootdown happen upon context switch? I would assume no, because there is a need to be able to execute multiple processes concurrently on multiprocessor CPUs. Is this assumption correct?
  2. When exactly does a TLB shootdown happen?
  3. Who performs the actual TLB shootdown? Is it the kernel(if so, where can I find the code that performs the flushing?) or is it the CPU(if so, what triggers the action) or is it both(the kernel executes an instruction which causes an interrupt, which in turns causes the CPU to perform the TLB shootdown)

Solution

  • The x86 TLB's are not shared across cores and are not synchronized among themselves at the hardware level.
    It is the OS that instructs a processor to flush its TLB.
    Instructing the "current" processor amounts to calling a function, instructing another processor amounts to making an IPI.

    The term "TLB shootdown" refers explicitly to this (even more than normal) expensive case where, to keep system consistency, the OS has to tell other processors to invalidate their TLBs in order to reach the same mapping of a specific processor.

    I think this is only necessary if the new mapping affects some shared memory, otherwise each processor is executing a different instance of a process, each one with its mapping.

    During a context switch, the TLB is flushed to remove the old mapping, this must be done independently of the last processor the scheduled program ran on.
    Since the processor is flushing its own TLB, this is not a TLB shootdown.

    Shared areas that must be kept consistent all the time between processor can be: kernel pages, memory mapped IO, shared memory mapped files.

    The execution of the instructions invlpg, invpcid, a move to cr0, cr3 (including during an hw task switch) or cr4 and a VMX transition, all invalidate the TLB.
    For the exact granularity and semantic, see section 4.10.4 of the Intel Manual 3.