Search code examples
assemblyx86interruptx86-16gnu-assembler

Writing interrupt handler in x86 real mode assembly


I am learning interrupt handling in x86 real mode using assembly. I am following the below example taken from here :

.include "common.h"
BEGIN
    CLEAR
    /* Set address of the handler for interrupt 0. */
    movw $handler, 0x00
    /* Set code segment of the handler for interrupt 0. */
    mov %cs, 0x02
    int $0
    PUTC $'b
    hlt
handler:
    PUTC $'a
    iret

But when I compile and run the above code,

$ as --32 -o main.o main.S -g
$ ld -T linker.ld -o main.img --oformat binary -m elf_i386 -nostdlib main.o
$ qemu-system-i386 -hda main.img

I get the following error :

qemu-system-i386: Trying to execute code outside RAM or ROM at 0xf00fff53
This usually means one of the following happened:

(1) You told QEMU to execute a kernel for the wrong machine type, and it crashed on startup (eg trying to run a raspberry pi kernel on a versatilepb QEMU machine)
(2) You didn't give QEMU a kernel or BIOS filename at all, and QEMU executed a ROM full of no-op instructions until it fell off the end
(3) Your guest kernel has a bug and crashed by jumping off into nowhere

This is almost always one of the first two, so check your command line and that you are using the right type of kernel for this machine.
If you think option (3) is likely then you can try debugging your guest with the -d debug options; in particular -d guest_errors will cause the log to include a dump of the guest register state at this point.

Execution cannot continue; stopping here.

What am I missing here? Why mov %cs, 0x02 is needed or what its really doing?

I tried debugging this under gdb, when I did step execution line by line, i did not face this error under gdb, which was wierd and am still checking.

EDIT

This is how BEGIN is defined :

.macro BEGIN
    .local after_locals
    .code16
    cli
    /* Set %cs to 0. */
    ljmp $0, $1f
    1:
    xor %ax, %ax
    /* We must zero %ds for any data access. */
    mov %ax, %ds
    mov %ax, %es
    mov %ax, %fs
    mov %ax, %gs
    mov %ax, %bp
    /* Automatically disables interrupts until the end of the next instruction. */
    mov %ax, %ss
    /* We should set SP because BIOS calls may depend on that. TODO confirm. */
    mov %bp, %sp
    /* Store the initial dl to load stage 2 later on. */
    mov %dl, initial_dl
    jmp after_locals
    initial_dl: .byte 0
after_locals:
.endm

Solution

  • I can only assume that you have introduced bugs into the code originally presented in the tutorial. For instance you say you assemble with:

    as --32 -o main.o main.S -g
    

    If you include common.h as it appears in the tutorial this command should fail with something like:

    common.h: Assembler messages:
    common.h:399: Warning: stray `\'
    common.h:400: Warning: stray `\'
    common.h:401: Warning: stray `\'
    common.h:421: Warning: stray `\'
    common.h:422: Warning: stray `\'
    common.h:423: Warning: stray `\'
    common.h:424: Warning: stray `\'
    common.h:425: Warning: stray `\'
    

    These errors occur because the way the tutorial code was written it requires the C preprocessor to be run on the assembly code. The easiest way is to use GCC to assemble the code by passing it to the backend AS assembler:

    gcc -c -g -m32 -o main.o main.S
    

    GCC will take any file extension with a .S extension and run the C preprocessor on the .Sbefore passing it through the AS assembler. As an alternative you can run the C preprocessor directly with cpp and then run as separately.

    To build main.img with GCC you'd use commands like these:

    gcc -c -g -m32 -o main.o main.S
    ld -T linker.ld -o main.img --oformat binary -m elf_i386 -nostdlib main.o
    

    To build it with the C preprocessor you could do:

    cpp main.S > main.s
    as -g --32 -o main.o main.s
    ld -T linker.ld -o main.img --oformat binary -m elf_i386 -nostdlib main.o
    

    The code worked as expected when run with QEMU using:

    qemu-system-i386 -hda main.img
    

    The output should appear similar to:

    enter image description here

    Question regarding CS and the Real Mode IVT

    You inquired about this code:

    /* Set address of the handler for interrupt 0. */
    movw $handler, 0x00
    /* Set code segment of the handler for interrupt 0. */
    mov %cs, 0x02
    int $0
    

    In real mode the default IBM-PC Interrupt Vector Table (IVT) is the first 1024 bytes of memory starting at physical address 0x00000 (0x0000:0x0000) up to 0x00400 (0x0000:0x0400). Each entry in the IVT is 4 bytes (4 bytes per entry*256 interrupts=1024 bytes). A word (2 bytes) for the Instruction Pointer (IP) (also referred to as the offset) where the interrupt vector is located followed by a word (2 bytes) that contains the segment.

    Interrupt 0 starts at the very bottom of the IVT at memory 0x000000 (0x0000:0x0000). Interrupt 1 starts at 0x00004 (0x0000:0x0004)... Interrupt 255 starts at 0x003FC (0x0000:0x03FC).

    The instruction:

    /* Set address of the handler for interrupt 0. */
    movw $handler, 0x00
    

    Moves the 16-bit offset of handler to memory address DS:0x0000 . With 16-bit addressing DS is always an implied segment unless register BP appears in a memory reference (ie. (%bp)) then the segment is assumed to be SS.

    DS is set to 0x0000 in the BEGIN macro so DS:0x00 is 0x0000:0x0000 which is the IP (offset) portion of Interrupt 0's segment:offset address. The instruction:

    /* Set code segment of the handler for interrupt 0. */
    mov %cs, 0x02
    

    CS is set to 0x0000 in the BEGIN macro. This instruction moves 0x0000 to memory address DS:0x02 (0x0000:0x0002). 0x0000:0x0002 is the segment portion of Interrupt 0's address. After this instruction the IVT entry for Interrupt 0 now points at the handler code in our boot sector. The instruction:

    int $0
    

    Invokes interrupt 0 which is now pointing at handler. It should display a to the screen and then continue with the code after int $0 which prints b and then halts.


    Code for Minimal Complete Verifiable Example

    Your question lacks a minimal complete verifiable example. I modified common.h to only include the macros needed by the code you wrote and kept everything else the same:

    linker.ld:

    SECTIONS
    {
        /* We could also pass the -Ttext 0x7C00 to as instead of doing this.
         * If your program does not have any memory accesses, you can omit this.
         */
        . = 0x7c00;
        .text :
        {
            __start = .;
    
            /* We are going to stuff everything
             * into a text segment for now, including data.
             * Who cares? Other segments only exist to appease C compilers.
             */
            *(.text)
    
            /* Magic bytes. 0x1FE == 510.
             *
             * We could add this on each Gas file separately with `.word`,
             * but this is the perfect place to DRY that out.
             */
            . = 0x1FE;
            SHORT(0xAA55)
    
            /* This is only needed if we are going to use a 2 stage boot process,
             * e.g. by reading more disk than the default 512 bytes with BIOS `int 0x13`.
             */
            *(.stage2)
    
            /* Number of sectors in stage 2. Used by the `int 13` to load it from disk.
             *
             * The value gets put into memory as the very last thing
             * in the `.stage` section if it exists.
             *
             * We must put it *before* the final `. = ALIGN(512)`,
             * or else it would fall out of the loaded memory.
             *
             * This must be absolute, or else it would get converted
             * to the actual address relative to this section (7c00 + ...)
             * and linking would fail with "Relocation truncated to fit"
             * because we are trying to put that into al for the int 13.
             */
            __stage2_nsectors = ABSOLUTE((. - __start) / 512);
    
            /* Ensure that the generated image is a multiple of 512 bytes long. */
            . = ALIGN(512);
            __end = .;
            __end_align_4k = ALIGN(4k);
        }
    }
    

    common.h:

    /* I really want this for the local labels.
     *
     * The major downside is that every register passed as argument requires `<>`:
     * http://stackoverflow.com/questions/19776992/gas-altmacro-macro-with-a-percent-sign-in-a-default-parameter-fails-with-oper/
     */
    .altmacro
    
    /* Helpers */
    
    /* Push registers ax, bx, cx and dx. Lightweight `pusha`. */
    .macro PUSH_ADX
        push %ax
        push %bx
        push %cx
        push %dx
    .endm
    
    /* Pop registers dx, cx, bx, ax. Inverse order from PUSH_ADX,
     * so this cancels that one.
     */
    .macro POP_DAX
        pop %dx
        pop %cx
        pop %bx
        pop %ax
    .endm
    
    
    /* Structural. */
    
    /* Setup a sane initial state.
     *
     * Should be the first thing in every file.
     *
     * Discussion of what is needed exactly: http://stackoverflow.com/a/32509555/895245
     */
    .macro BEGIN
        LOCAL after_locals
        .code16
        cli
        /* Set %cs to 0. TODO Is that really needed? */
        ljmp $0, $1f
        1:
        xor %ax, %ax
        /* We must zero %ds for any data access. */
        mov %ax, %ds
        /* TODO is it really need to clear all those segment registers, e.g. for BIOS calls? */
        mov %ax, %es
        mov %ax, %fs
        mov %ax, %gs
        /* TODO What to move into BP and SP?
         * http://stackoverflow.com/questions/10598802/which-value-should-be-used-for-sp-for-booting-process
         */
        mov %ax, %bp
        /* Automatically disables interrupts until the end of the next instruction. */
        mov %ax, %ss
        /* We should set SP because BIOS calls may depend on that. TODO confirm. */
        mov %bp, %sp
        /* Store the initial dl to load stage 2 later on. */
        mov %dl, initial_dl
        jmp after_locals
        initial_dl: .byte 0
    after_locals:
    .endm
    
    /* BIOS */
    
    .macro CURSOR_POSITION x=$0, y=$0
        PUSH_ADX
        mov $0x02, %ah
        mov $0x00, %bh
        mov \x, %dh
        mov \y, %dl
        int $0x10
        POP_DAX
    .endm
    
    /* Clear the screen, move to position 0, 0. */
    .macro CLEAR
        PUSH_ADX
        mov $0x0600, %ax
        mov $0x7, %bh
        mov $0x0, %cx
        mov $0x184f, %dx
        int $0x10
        CURSOR_POSITION
        POP_DAX
    .endm
    
    /* Print a 8 bit ASCII value at current cursor position.
     *
     * * `c`: r/m/imm8 ASCII value to be printed.
     *
     * Usage:
     *
     * ....
     * PUTC $'a
     * ....
     *
     * prints `a` to the screen.
     */
    .macro PUTC c=$0x20
        push %ax
        mov \c, %al
        mov $0x0E, %ah
        int $0x10
        pop %ax
    .endm
    

    main.S:

    .include "common.h"
    BEGIN
        CLEAR
        /* Set address of the handler for interrupt 0. */
        movw $handler, 0x00
        /* Set code segment of the handler for interrupt 0. */
        mov %cs, 0x02
        int $0
        PUTC $'b
        hlt
    handler:
        PUTC $'a
        iret
    

    Recommendations

    GDB (The GNU Debugger) doesn't understand real mode segment:offset addressing. Debugging real mode code with GDB is very problematic and not something I recommend. You should consider using BOCHS to debug real mode code as it understands real mode, segment:offset addressing, and is better suited for debugging bootloaders or any code that is run before entering 32-bit protected mode or long mode.