A working example of the CLI and STI instructions in x86 16-bit assembly

I would like a practical example of the cli and sti instructions in x86 16-bit assembly, i.e. an example in code written in this language which allows me to know what these instructions are for by practice and to go further than theory.

I know the documentation says that cli disables the interrupt flag and sti enables it and the interrupt flag does not affect not handling non-maskable interrupts (NMI) or software interrupts generated by the int instruction.

In the tutorial I follow I have this code:

    mov ax, 0x8000
    cli
    mov ss, ax
    mov sp, 0xF000
    sti

My tests make me say that cli and sti are useless in the example given in the course, after doing several tests I was able to verify that the results will always be the same whether I put cli and sti or remove these instructions.
The explanation of the usefulness of cli and sti by the speakers on the different topics for the example given in the course is purely theoretical. That is to say that you have to put cli and sti for safety to avoid bugs / crashes. A speaker on discord says there's a one in a million chance that something goes wrong when I initialize the segments and the stack offset. Which means he could never verify his theoretical explanation by himself, he just accepts the theory, no curiosity to go further and experiment by itself, impossible to verify by practice since there is a one in millions chance of having a problem.
On the various documentations/site strictly no other practical example that really demonstrates what cli and sti does and how it is useful, just copied and pasted the documentation without sample code, i.e. cli sets the interrupt flag to 0, and sti sets it to 1. When it is disabled hardware interrupts are ignored. Zero example of use, just the theoretical sentence and nothing in practice makes it possible to test this kind of thing, there is indeed an example on a French documentation but the example is also useless than the example of the tutorial that I follow to truly understand. That is to say an example which initializes a segment and which puts cli and sti before and after the line of code, and if we remove cli and sti the result will be the same no matter what (maybe we have a one in a million chance that a problem will occur if we remove cli and sti, that's good, it allows me to never check theory in practice).
Another speaker on discord tells me that he experimented a bit with all this, that he coded in assembly for a while and that in his experience he understands why you have to put cli and sti, because otherwise it can cause problems so you have to put it and that's all. When I ask him to give me a practical example (which should be right up his alley since he practiced), he does not do it because he is not at home, but on the other hand gives me a theoretical pad again to explain to me how it is useful, so obviously we can explain in great detail how useful it is but never demonstrate utility with a practical example in x86 16-bit assembly.

I specify that I am not familiar with the hardware interrupts. I only tested the software interrupts which one can call with int.

I'm in kernel mode, I want a practical example that causes the code to have trouble with hardware interrupts, then another example with cli and sti that can fix the problem.

Solution

Of course the whole point of cli/sti is to manage the handling of hardware interrupts, so you need some understanding of how hardware interrupts work in general.

Here's a brief overview: a hardware device connected to the CPU can trigger a hardware interrupt (in the case of an 8086, by connecting a high voltage to the INTR pin on the CPU chip, and using other pins to signal which interrupt vector should be called). When this occurs, if the interrupt flag is set, then the CPU completes the instruction that is currently executing, then pushes CS, IP and FLAGS onto the stack, and jumps to the address specified in the appropriate entry of the interrupt vector table, which is the low 1024 bytes of memory (0000:0000-0000:0400). The programmer should have previously set this entry to point to a block of code (interrupt handler) that is to run in response. The interrupt handler would do whatever is necessary to deal with the hardware interrupt and then execute IRET to return to whatever code was interrupted. Examples of devices causing hardware interrupts would be: key pressed on keyboard, byte arrives on serial port, timer interrupt (MS-DOS sets up the external timer to generate an interrupt at 18.2 Hz, i.e. every 55 ms).

If the interrupt flag is not set, nothing happens, but the interrupt handler will be called when the flag eventually is set again.

So, you would clear the interrupt flag whenever you do not want an interrupt to occur. This would normally be because you are working with some resource that is shared between the current code and the interrupt handler, such that there would be a conflict if the interrupt handler were to run at this moment.

For example, let's consider the timer interrupt. A simple handler might do nothing but increment a counter, so that the main thread of execution can tell how much time has passed. (The 8086 didn't have any other built-in clock hardware.) If a 16-bit counter is enough, you could simply have:

ticks DW 0
handler:
    inc word ptr [ticks]
    iret

main_code:
    mov ax, [ticks] ; now ax contains the number of ticks

But at 18.2 Hz, we get very close to 65536 ticks per hour (I think that's why the number 18.2 was chosen), so the counter will overflow about every hour. That is not good if you need to keep track of time intervals longer than that, so we should use a 32-bit counter instead. Since x86-16 has no 32-bit arithmetic instructions, we have to use an ADD/ADC pair. Our code could look like:

ticks DD 0
handler:
    add word ptr [ticks], 1
    adc word ptr [ticks+2], 0
    iret

main_code:
    mov ax, [ticks]
    ;;; BUG what if interrupt occurs here ???
    mov dx, [ticks+2]
    ; now dx:ax contains the 32-bit number of ticks

But this code has a bug. If by chance the timer interrupt should occur between the instructions marked BUG, the main code will get the low and high words of ticks out of sync. Suppose for instance that the value of ticks is 0x1234ffff. The main code loads the low word, 0xffff, into ax. Then the timer interrupt occurs and increments ticks, so that it is now 0x12350000. The interrupt handler returns and the main code does mov dx, [ticks+2], getting the value 0x1235. So now the main code has loaded the value 0x1235ffff, which is very wrong: it is an entire hour later than the actual time.

We could fix this by using cli/sti to disable interrupts, so that an interrupt cannot occur at the site labeled BUG. Corrected code would look like:

main_code:
    cli
    mov ax, [ticks]
    mov dx, [ticks+2]
    sti

In the particular case of a 32-bit counter, there happen to be other ways to fix this issue without disabling interrupts, but you get the idea. You can imagine some more complex data structure that both the handler and the main code might need to use: an I/O buffer, some larger struct with information about an I/O event that just occurred, a linked list, etc.

The CPU's registers are also a shared resource, such as the SS:SP example that you noticed. Suppose the stack is currently at 1234:5678 and the main code wants to switch it to 2222:4444. You would think to do:

switch_stack:
    mov ax, 0x2222
    mov ss, ax
    ;;; BUG: what if interrupt occurs here?
    mov sp, 0x4444

If an interrupt were to occur at line BUG, the value of SS:SP would be 2222:5678, and this is where the CPU would push the CS/IP/FLAGS values before jumping to the handler. This would be really bad, since that isn't the correct location of either the old or the new stack. There might be important data at that address, which the CPU is now overwriting, and so we are now going to have a hard-to-reproduce memory corruption bug on our hands.

So we would likewise think to fix it with

switch_stack:
    mov ax, 0x2222
    cli
    mov ss, ax
    ;;; interrupt can't occur here!
    mov sp, 0x4444
    sti

Now it so happens that this is actually a special case. Since this is a situation where forgetting to disable interrupts would be particularly nasty, the 8086 designers decided to do a little favor for programmers. The mov ss, reg instruction has a very special feature where it will automatically disable interrupts for one instruction. So in fact, if you code mov ss, ax followed immediately by mov sp, 0x2222, an interrupt cannot occur between then, and the code is actually safe without cli/sti.

But let me emphasize again that this is a unique special case. I believe it is only mov ss, reg and pop ss that have such functionality, so examples like the 32-bit ticks counter really do need cli/sti. And in fact, if you had reversed the two instructions and coded mov sp, 0x2222 followed by mov ss, ax (which on its face would appear just as good), you would again have a bug, and the interrupt handler could be called with the stack pointing to 1234:2222. Also, as @ecm noted in a comment, some early 8086/8088 chips had a hardware bug (?) where the "disable interrupts for one instruction" feature didn't work, so on such chips you would also have to use cli/sti. (Or maybe this feature was not actually part of the spec until later?)

The 386 added an lss instruction to load both the stack segment and stack pointer in a single instruction, which was a more robust way to address this issue. It was also more important in that case because in virtual 8086 mode, cli/sti would not execute directly but instead would trap to the operating system, which was very slow and best to avoid if possible.

You suggest that maybe this has such low probability that we wouldn't really need to worry about it. Let's look at our 32-bit timer example and imagine an application with an "alarm clock" feature. While doing other work, it periodically checks the ticks counter, let's say about 100 times per second, to see if a specified time has passed, and if so, it does something to alert the user. If you leave out the cli/sti, then if an interrupt occurs there with the low word equal to 0xffff (which happens once per hour), it will think the time is one hour later than it is, and so may issue an alert up to one hour too soon. (If you want to be more dramatic, replace "issue an alert" with "activate dangerous machinery", "fire missiles", etc.)

The mov ax, mem instruction on the 8086 took 10 clock cycles, so there are 1000 clock cycles per second when we are vulnerable. The original IBM PC was clocked at 4.7 MHz, so we have about a 1/4700 chance at the top of every hour of the bug triggering. If you ship your application to 50,000 users, and each of them uses it for 8 hours per day, then with a little math, you can work out that you can expect to receive 425 complaints about this bug within the first week of release. Your boss is going to be pretty mad.

And remember, we are back in the mid-1980s and there is no Internet, so you're going to have to mail every one of your 50,000 customers a floppy disk with the patch. At the cost of a couple dollars plus postage, this bug has cost the company about $100,000. In contrast, your salary as an entry-level programmer in 1984 is about $20,000 per year. How do you like your odds of keeping your job?