Why kernel preemption is safe only when preempt_count == 0?

Linux kernel 2.6 introduced a new per-thread field---preempt_count---which is incremented/decremented whenever a lock is acquired/released. This field is used to allow kernel preemption: "If need_resched is set and preempt_count is zero, then a more important task is runnable and it is safe to preempt."

According to the "Linux Kernel Development" book by Robert Love: "So when is it safe to reschedule? The kernel is capable of preempting a task running in the kernel so long as it does not hold a lock."

My question is: why isn't it safe to preempt a task running in the kernel while this task holds a lock?

If another task is scheduled and tries to grab the lock, it will block (or spin until its time slice ends), so we wouldn't get the two threads simultaneously inside the same critical section. Can anyone please outline a problematic scenario in case we do preempt a task that holds a lock in kernel-mode?

Thanks!

Solution

While this is an old question, the accepted answer isn't correct.

First of all the title is asking:

Why kernel preemption is safe only when preempt_count > 0?

This isn't correct, it's the opposite. Kernel preemption is disabled when preempt_count > 0, and enabled when preempt_count == 0.

Furthermore, the claim:

If another task is scheduled and tries to grab the lock, it will block (or spin until its time slice ends),

Is not always true.

Say you acquire a spin lock. Preemption is enabled. A process switch happens, and in the context of the new process some softirq runs. Preemption is disabled while running softirqs. If one of those softirqs attempts to accquire your lock it will never stop spinning because preemption is disabled. Thus you have a deadlock.

You have no control over whether the process that preempts yours will run softirqs or not. The preempt_count field where you disable softirqs is process-specific. Softirqs have to run with preemption disabled to preserve the per-cpu serialization of softirqs.