linux-kernel linux-device-driver scheduler cpu-architecture smp

who does Napi scheduling

I had a doubt while reading about NAPI scheduling in Network drivers.

Typically, entire network processing code runs in softirq context. And with NAPI polling mechanism the driver will poll for packets after interrupt arrives.

So, if NAPI code also runs in softirq context, how can one schedule it. (Since, Interrupt context code cannot be scheduled).

And what is the use of work-queues in network drivers.

Solution

Scheduling in the NAPI sense just means marking it as needing to run. In other words, you simply make the function call saying "schedule me to run in the NAPI softirq". This causes your driver's poll function to be added to a list of "need-to-be-polled" devices, and it also causes the NAPI softirq to be activated "on the way out."

So it typically works this way. Your driver configures your device to tell it to interrupt at some point in the future when some packets (ideally more than one so as to amortize the overhead) are ready to be processed. In the meantime, the kernel schedules ordinary user-space processes...

When your device interrupts:

the interrupt causes a transition to kernel-mode if not already in kernel-mode
the linux interrupt handling code finds your driver's interrupt handler routine and invokes it.
Your interrupt handler calls napi_schedule (placing your poll function on a list and triggering the softirq.
Your interrupt handler returns.
Just before returning to user-mode (or whatever the CPU was doing prior to the interrupt), the interrupt handling code sees that the network softirq needs to run and activates it.
The softirq calls your poll function to process incoming packets until you have no more packets or until the NAPI "budget" is exhausted. (In the latter case, the softirq is later re-invoked by the ksoftirqd thread [I think].)
Your driver would then only re-enable interrupts on your device if it had completed processing all the ready-to-process packets.

In my experience, work-queues are typically used only for certain long-duration operations that need to be done in a schedulable context (i.e. a real task context that can block [sleep] while waiting for something to complete).

For example, in the intel ixgbe driver, if the driver detects that the NIC needs to be reset, a work-queue entry is triggered to re-initialize the NIC. Some of the operations require relatively long sleeps and you don't want to tie up the processor in a softirq context if you can let some other task run. There may be other reasons as well -- for example, you need to allocate large amounts of memory that may require the allocation call to be made in task context.