I saw this piece of code on disk read in Linux 0.11 kernel:
static inline void lock_buffer(struct buffer_head * bh)
{
cli();
while (bh->b_lock)
sleep_on(&bh->b_wait);
bh->b_lock=1;
sti();
}
IIUC, cli()
will block the interrupt (not blocking all as explained here: https://c9x.me/x86/html/file_module_x86_id_31.html, but still, block some interrupts which means it changes the default behavior).
And sleep_on
will call schedule
, which will pass the control flow to another process.
However, what makes me confused is that here we will switch to another process with some of the interrupts blocked, which seems error-prone because the other process should expect the default behavior. So is this a correctly written piece of code (if so, why?) or it is just a wrongly written one which will cause unexpected behaviors?
I presume that the interrupt handler of the disk drive will be the one to wakeup(&bh->b_wait), which could lead to a missed wakeup if interrupts were not disabled in the process waiting for this block.
Remember that condition variables (sleep_on, wakeup) have no memory: sleep_on will suspend until wakeup is called; it doesn't matter if wakeup is called just before sleep_on.
From the point in time of testing bh->b_lock, the caller is racing with the interrupt handler; thus cli (or, more typical unix splbio()) blocks the interrupt handler, preventing the race.
Since the kernel saves the interrupt state (mask, priority, ...) with the process state, when sleep_on cause a reschedule, it is most likely that interrupts will be re-enabled; or at least eventually will be. The disk interrupt will eventually run, waking-up this process.
When this process is rescheduled, its saved interrupt state (disabled) will be restored, so that the test & assignment of b_lock will also prevent interference from the disk interrupt handler.