this assembly code enters the protected mode from bootloader but fails to reset CS segment (or do the far jump) after the far jump is called and reboots. If I remove the far jump it goes into the infinite loop (0x66,jmp $) in protected mode just fine, without rebooting.
[bits 16]
[org 0x7c00]
xor ax,ax
xor eax,eax
add eax,ENTRY_POINT_32 ;address to plug to far jmp
mov [ENTRY_OFF],eax
xor eax,eax
mov eax,GDT ;load GDT label address
mov [GDTR+2],eax ; load it into address space in GDTR
lgdt [GDTR] ;load GDTR
cli ;turn off masked interrupts
in al,0x70
or al,0x80
out 0x70,al ;turn off nonmasked interrupts
in al,0x92
or al,2
out 0x92, al ;open line A20 (change address 20 to 32 bits)
mov eax,cr0
or al,1
mov cr0,eax ;switch to protected mode
db 0x66 ;prefix of opcode to change bitness
db 0xEA ;opcode of jmp far
ENTRY_OFF dd 0x0 ;32 bit offset of 32 bit instructions
dw 00001000b ; selector 1st descriptor CODE_descr,=1
ENTRY_POINT_32:
db 0x66 ;prefix of opcode to change bitness
jmp $ ;infinite jump to the same location
GDT:
NULL_descr dd 0x0,0x0 ; must be present in GDT
CODE_descr db 0xFF,0xFF,0x0,0x0,0x0,10011010b,11001111b,0x0
;descriptor of 32 bit code segment, base 0, size ffffffff
DATA_descr db 0xFF,0xFF,0x0,0x0,0x0,10010010b,11001111b,0x0
;descriptor of 32 bit data segment, base 0, size ffffffff
VIDEO_descr 0xFF,0xFF,0x0,0x80,0x0B,10010010b,01000000b,0x0
;descriptor of video buffer, base 0x000B8000, size ffff
GDT_size db $-GDT ;size of GDT table
GDTR dw GDT_size-1 ;next 3 words are size &
dd 0x0 ;address of beginning of GDT, loaded in code
times 510 - ($ - $$) db 0
dw 0xaa55
Original code from wasm.in, with slight modifications.
In real mode there is an implied segment on all memory operands. If the memory operand doesn't contain BP as a base then the implied segment is DS. If the memory operand does contain BP the implied base is SS. Your memory operands don't use BP so the implied segment is DS. Instructions with a memory operand like this:
mov [ENTRY_POINT_32],eax
Are equivalent to:
mov [ds:ENTRY_POINT_32],eax
Real mode uses segment:offset addressing to arrive at a physical memory address. If DS is wrong you will be writing to the wrong memory location. 20-bit Physical Address = (segment<<4)+offset.
With that being said, when your bootloader starts you can't rely on the segment and general purpose registers being the values you expect with the exception of DL which contains the boot drive passed by the BIOS. You can read my Bootloader Tips for more information on bootloader development.
You need to explicitly set the DS register. Since your code is using org 0x7c00
you need a DS segment set to zero. (0x0000<<4)+0x7c00 = 0x07c00 (physical address). The bootloader is always loaded by the BIOS to physical address 0x07c00.
You also have these two lines:
xor ax,ax
xor eax,eax
The first is unnecessary since you set all of EAX to zero with the latter. The following line is unnecessary if you use the bits 32
NASM directive before your 32-bit code:
db 0x66 ;prefix of opcode to change bitness
The GDTR is also set up incorrectly. You compute the size improperly. You have this code:
GDT_size db $-GDT ;size of GDT table
GDTR dw GDT_size-1 ;next 3 words are size &
You create a memory location with a byte containing the size of the GDT. GDTR dw GDT_size-1
takes the offset of the label GDT_size
and subtract one from it. This only works because the offset of label GDT_size
is greater than the size of the GDT. You could do something like:
GDT:
NULL_descr: dd 0x0,0x0 ; must be first entry in GDT
; descriptor of 32 bit code segment, base 0, size ffffffff
CODE_descr: db 0xFF,0xFF,0x0,0x0,0x0,10011010b,11001111b,0x0
; descriptor of 32 bit data segment, base 0, size ffffffff
DATA_descr: db 0xFF,0xFF,0x0,0x0,0x0,10010010b,11001111b,0x0
; descriptor of video buffer, base 0x000B8000, size ffff
VIDEO_descr: db 0xFF,0xFF,0x0,0x80,0x0B,10010010b,01000000b,0x0
GDT_END:
GDTR dw GDT_END-GDT-1 ; Size of GDT (minus 1)
dd 0x0 ; address of beginning of GDT, loaded in code
When creating self-modifying code you also need to concern yourself with clearing the instruction prefetch queue to ensure the processor sees the changes to the code. The processor may have already pre-read the FAR JMP instruction you are modifying and isn't aware of the change you made to the code. That can be rectified by simply inserting a JMP into the code after you modify the instruction. After you update the instruction with the computed address you can do something like:
mov [ENTRY_OFF],eax
jmp clear_prefetch ; Clear the instruction prefetch queue
; by jumping to next instruction
clear_prefetch:
Working code (I have cleaned up the formatting a bit) could look like:
bits 16
org 0x7c00
start:
xor eax,eax
mov ds, ax ; Explicitly set DS to zero
add eax,ENTRY_POINT_32 ; address to plug to far jmp
mov [ENTRY_OFF],eax
jmp clear_prefetch ; Clear the instruction prefetch queue
; by jumping to next instruction
clear_prefetch:
xor eax,eax
mov eax,GDT ; load GDT label address
mov [GDTR+2],eax ; load it into address space in GDTR
lgdt [GDTR] ; load GDTR
cli ; turn off masked interrupts
in al,0x70
or al,0x80
out 0x70,al ; turn off nonmasked interrupts
in al,0x92
or al,2
out 0x92, al ; enable A20 line
mov eax,cr0
or al,1
mov cr0,eax ; switch to protected mode
db 0x66 ; prefix of opcode to change bitness
db 0xEA ; opcode of jmp far
ENTRY_OFF:
dd 0x0 ; 32 bit offset of 32 bit instructions
dw 00001000b ; selector 1st descriptor CODE_descr,=1
bits 32
ENTRY_POINT_32:
jmp $ ; infinite jump to the same location
GDT:
NULL_descr: dd 0x0,0x0 ; must be first entry in GDT
; descriptor of 32 bit code segment, base 0, size ffffffff
CODE_descr: db 0xFF,0xFF,0x0,0x0,0x0,10011010b,11001111b,0x0
; descriptor of 32 bit data segment, base 0, size ffffffff
DATA_descr: db 0xFF,0xFF,0x0,0x0,0x0,10010010b,11001111b,0x0
; descriptor of video buffer, base 0x000B8000, size ffff
VIDEO_descr: db 0xFF,0xFF,0x0,0x80,0x0B,10010010b,01000000b,0x0
GDT_END:
GDTR dw GDT_END-GDT-1 ; Size of GDT (minus 1)
dd 0x0 ; address of beginning of GDT, loaded in code
times 510 - ($ - $$) db 0
dw 0xaa55
Your code is overly complex for this situation. Legacy BIOSes on the x86 always load the bootloader to physical address 0x07c00. An advantage to using an ORG 0x7c00
and setting the segments to 0x0000 is that 0x0000:0x7c00 and the linear address (same as physical address in real mode) are the same offset of 0x07c00 from the start of memory. You can use this to your advantage and avoid unnecessary computation at run-time. The code could look like this:
bits 16
org 0x7c00
start:
xor ax,ax
mov ds,ax ; Explicitly set DS to zero
lgdt [GDTR] ; load GDTR
cli ; turn off masked interrupts
in al,0x70
or al,0x80
out 0x70,al ; turn off nonmasked interrupts
in al,0x92
or al,2
out 0x92, al ; enable A20 line
; Enter protected mode
mov eax,cr0
or al,1
mov cr0,eax ; switch to protected mode
jmp CODE32_SEL:ENTRY_POINT_32
bits 32
ENTRY_POINT_32:
mov eax, DATA32_SEL ; Set the protected mode selector
mov ds, ax
mov fs, ax
mov gs, ax
mov ss, ax
mov esp, 0x9C000 ; Set protected mode stack below EBDA
mov eax, VIDEO32_SEL ; Set the video memory selector
mov es, ax
; Print some characters to top left of the screen in white on magenta
xor ebx, ebx
mov word [es:ebx], 0x57 << 8 | 'M'
mov word [es:ebx+2], 0x57 << 8 | 'D'
mov word [es:ebx+4], 0x57 << 8 | 'P'
jmp $ ; infinite jump to the same location
GDT:
NULL_descr: dd 0x0,0x0 ; must be first entry in GDT
; descriptor of 32 bit code segment, base 0, size ffffffff
CODE_descr: db 0xFF,0xFF,0x0,0x0,0x0,10011010b,11001111b,0x0
; descriptor of 32 bit data segment, base 0, size ffffffff
DATA_descr: db 0xFF,0xFF,0x0,0x0,0x0,10010010b,11001111b,0x0
VIDEO_descr: db 0xFF,0xFF,0x0,0x80,0x0B,10010010b,01000000b,0x0
; descriptor of video buffer, base 0x000B8000, size ffff
GDT_END:
CODE32_SEL equ CODE_descr-GDT
DATA32_SEL equ DATA_descr-GDT
VIDEO32_SEL equ VIDEO_descr-GDT
GDTR dw GDT_END-GDT-1 ; Size of GDT (minus 1)
dd GDT ; address of beginning of GDT
times 510 - ($ - $$) db 0
dw 0xaa55
This code computes the CODE and DATA selectors at assembly time. It also computes the GDTR at assembly time and hard codes the FAR JMP. It should be noted that since the bootloader and the 32-bit entry point are entirely inside the first 64KiB of memory you can use a 16-bit offset rather than 32-bit in the FAR JMP to protected mode. There is no need for self modifying code.
Note: Creating a selector for the video memory isn't necessary. You can always address that memory using the 32-bit 4GiB flat data selector.
The concept of building up a FAR JMP, and generating the GDTR record at run time isn't entirely without merit. In environments where the code may be placed in memory at different segments then you'd need to compute the FAR JMP and the GDT linear address for the GDTR at run-time. This would be the case if you were trying to enter protected mode from DOS via a COM or EXE program. The DOS loader decides what segment to place things in. In that case you'd have to compute the addresses at run-time. I wrote some code a couple of years ago for someone on IRC that does just that. My code doesn't disable the NMIs (it should), and it doesn't modify the FAR JMP. What I do is build the FAR JMP address on the stack, and then do an indirect FAR JMP via the address on the stack. The principle is the same as doing the self modifying code.
A sample DOS COM program that does run-time generation of the address for the FAR JMP on the stack and generates the GDT address in the GDTR is as follows:
; Assemble with NASM as
; nasm -f bin enterpm.asm -o enterpm.com
STACK32_TOP EQU 0x200000
CODE32_REL EQU 0x110000
VIDEOMEM EQU 0x0b8000
use16
; COM program CS=DS=SS
org 100h
call check_pmode ; Check if we are already in protected mode
; This may be the case if we are in a VM8086 task.
; EMM386 and other expanded memory manager often
; run DOS in a VM8086 task. DOS extenders will have
; the same effect
jz not_prot_mode ; If not in protected mode proceed to switch
mov dx, in_pmode_str; otherwise print an error and exit back to DOS
mov ah, 0x9
int 0x21 ; Print Error
ret
not_prot_mode:
call a20_on ; Enable A20 gate (uses Fast method as proof of concept)
cli
; Compute linear address of label gdt_start
; Using (segment << 4) + offset
mov eax,cs ; EAX = CS
shl eax,4 ; EAX = (CS << 4)
mov ebx,eax ; Make a copy of (CS << 4)
add [gdtr+2],eax ; Add base linear address to gdt_start address
; in the gdtr
lgdt [gdtr] ; Load gdt
; Compute linear address of label code_32bit
; Using (segment << 4) + offset
add ebx,code_32bit ; EBX = (CS << 4) + code_32bit
push dword 0x08 ; CS Selector
push ebx ; Linear offset of code_32bit
mov bp, sp ; m16:32 address on top of stack, point BP to it
mov eax,cr0
or eax,1
mov cr0,eax ; Set protected mode flag
jmp dword far [bp] ; Indirect m16:32 FAR jmp with
; m16:32 constructed at top of stack
; DWORD allows us to use a 32-bit offset in 16-bit code
; 16-bit functions that run in real mode
; Check if protected mode is enabled, effectively checking if we are
; in in a VM8086 task. Set ZF to 1 if in protected mode
check_pmode:
smsw ax
test ax, 0x1
ret
; Enable a20 (fast method). This may not work on all hardware
a20_on:
cli
in al, 0x92 ; Read System Control Port A
test al, 0x02 ; Test current a20 value (bit 1)
jnz .skipfa20 ; If already 1 skip a20 enable
or al, 0x02 ; Set a20 bit (bit 1) to 1
and al, 0xfe ; Always write a zero to bit 0 to avoid
; a fast reset into real mode
out 0x92, al ; Enable a20
.skipfa20:
sti
ret
in_pmode_str: db "Processor already in protected mode - exiting",0x0a,0x0d,"$"
align 4
gdtr:
dw gdt_end-gdt_start-1
dd gdt_start
gdt_start:
; First entry is always the Null Descriptor
dd 0
dd 0
gdt_code:
; 4gb flat r/w/executable code descriptor
dw 0xFFFF ; limit low
dw 0 ; base low
db 0 ; base middle
db 0b10011010 ; access
db 0b11001111 ; granularity
db 0 ; base high
gdt_data:
; 4gb flat r/w data descriptor
dw 0xFFFF ; limit low
dw 0 ; base low
db 0 ; base middle
db 0b10010010 ; access
db 0b11001111 ; granularity
db 0 ; base high
gdt_end:
; Code that will run in 32-bit protected mode
; Align code to 4 byte boundary. code_32bit label is
; relative to the origin point 100h
align 4
code_32bit:
use32
; Set virtual memory address of pm code/data to CODE32_REL
; We will be relocating this section from low memory where DOS
; originally loaded it.
section protectedmode vstart=CODE32_REL, valign=4
start_32:
cld ; Direction flag forward
mov eax,0x10 ; 0x10 is flat selector for data
mov ds,eax
mov es,eax
mov fs,eax
mov gs,eax
mov ss,eax
mov esp,STACK32_TOP ; Should set ESP to a usable memory location
; Stack will be grow down from this location
mov edi,start_32 ; EDI = linear address where PM code will be copied
mov esi,ebx ; ESI = linear address of code_32bit
mov ecx,PMSIZE_LONG ; ECX = number of DWORDs to copy
rep movsd ; Copy all code/data from code_32bit to CODE32_REL
jmp 0x08:.relentry ; Absolute jump to relocated code
.relentry:
mov ah, 0x57 ; Attribute white on magenta
; Print a string to display
mov esi,str ; ESI = address of string to print
mov edi,VIDEOMEM ; EDI = base address of video memory
call print_string_attr
cli
endloop:
hlt ; Halt CPU with infinite loop
jmp endloop
print_string_attr:
push ecx
xor ecx,ecx ; ECX = 0 current video offset
jmp .loopentry
.printloop:
mov [edi+ecx*2],ax ; Copy attr and character to display
inc ecx ; Next word position
.loopentry:
mov al,[esi+ecx] ; Get next character to print
test al,al
jnz .printloop ; If it's not NUL continue
.endprint:
pop ecx
ret
str: db "Protected Mode",0
PMSIZE_LONG equ ($-$$+3)>>2
; Number of DWORDS that the protected mode
; code and data takes up (rounded up)
This code is a little more complex than I might give it credit for. The part of interest would be the pointer calculations in not_prot_mode
which are similar to the types of calculations your code is doing. After entering protected mode the code relocates itself above DOS at 0x00110000. That was a requirement of the person who originally asked me about switching into protected mode.
Note: This code only runs in an environment where protected mode is not already enabled. It will display an error and exit if being run inside a VM8086 task.