Search code examples
clinux-kernelschedulingspinlockpreemption

Why does spin_unlock_bh function enables preemption without calling the scheduler


I was looking into the spinlock code of kernel code (version 3.10.1), and didnt understand one thing.

When acquiring the spinlock through the function spin_lock_bh(), it goes ahead and calls preempt_disable(). This is the same as other spinlock functions which is used to acquire, for example spin_lock() and spin_lock_irq().

But when releasing the lock through spin_unlock_bh(), it calls preempt_enable_no_resched(), which skips calling the scheduler to preempt. That is not the case for the other corresponding release functions (like spin_unlock() and spin_unlock_irq()). They call the regular preempt_enable() function which calls __schedule().


Solution

  • local_bh_disable() increments preempt_count counter by a specific value, also preempt_disable() increments it by 1. That's what __raw_spin_lock_bh() does.

    preempt_enable() function (which is invoked from __raw_spin_unlock() and __raw_spin_unlock_irq()) invokes preempt_check_resched(). But there is no need to try to schedule when preemption is still disabled. It will be done inside _local_bh_enable_ip() on function exit.

    Looking at source code you can see that the real "BH" spinlock call sequence is:

    spin_release(&lock->dep_map, 1, _RET_IP_);
    do_raw_spin_unlock(lock);
    preempt_enable_no_resched();
        \____barrier();
        \____dec_preempt_count(); // <--- decrease counter, but we can't schedule here
    local_bh_enable_ip();
        \____sub_preempt_count() // <--- real disabling preemption
        \____preempt_check_resched(); // <--- schedule
    

    But f.e. "IRQ" spinlock call sequence:

    spin_release(&lock->dep_map, 1, _RET_IP_);
    do_raw_spin_unlock(lock);
    local_irq_enable();
    preempt_enable();
        \____barrier();
        \____dec_preempt_count(); // <--- real disabling preemption
        \____barrier();
        \____preempt_check_resched(); // <--- schedule
    

    To sum up: in case of BH-spinlock it just bypasses preempt_check_resched() because it's not needed.