c linux-kernel scheduling spinlock preemption

Why does spin_unlock_bh function enables preemption without calling the scheduler

I was looking into the spinlock code of kernel code (version 3.10.1), and didnt understand one thing.

When acquiring the spinlock through the function spin_lock_bh(), it goes ahead and calls preempt_disable(). This is the same as other spinlock functions which is used to acquire, for example spin_lock() and spin_lock_irq().

But when releasing the lock through spin_unlock_bh(), it calls preempt_enable_no_resched(), which skips calling the scheduler to preempt. That is not the case for the other corresponding release functions (like spin_unlock() and spin_unlock_irq()). They call the regular preempt_enable() function which calls __schedule().

Solution

local_bh_disable() increments preempt_count counter by a specific value, also preempt_disable() increments it by 1. That's what __raw_spin_lock_bh() does.

preempt_enable() function (which is invoked from __raw_spin_unlock() and __raw_spin_unlock_irq()) invokes preempt_check_resched(). But there is no need to try to schedule when preemption is still disabled. It will be done inside _local_bh_enable_ip() on function exit.

Looking at source code you can see that the real "BH" spinlock call sequence is:

spin_release(&lock->dep_map, 1, _RET_IP_);
do_raw_spin_unlock(lock);
preempt_enable_no_resched();
    \____barrier();
    \____dec_preempt_count(); // <--- decrease counter, but we can't schedule here
local_bh_enable_ip();
    \____sub_preempt_count() // <--- real disabling preemption
    \____preempt_check_resched(); // <--- schedule

But f.e. "IRQ" spinlock call sequence:

spin_release(&lock->dep_map, 1, _RET_IP_);
do_raw_spin_unlock(lock);
local_irq_enable();
preempt_enable();
    \____barrier();
    \____dec_preempt_count(); // <--- real disabling preemption
    \____barrier();
    \____preempt_check_resched(); // <--- schedule

To sum up: in case of BH-spinlock it just bypasses preempt_check_resched() because it's not needed.