Search code examples
assemblyx86-16masmreal-modememory-segmentation

MASM x86 , ret returns to a wrong address


I have an assembly code which loads a COM file to the memory and runs it. The COM file will be loaded to a separate data-segment, changes the DS and SS to that segment and calls it to run the COM file. My data and the Stack segments are:

STACKSEG SEGMENT  STACK 'stack'
    DW 512 DUP(?)
STACKSEG ENDS

DATASEG1 SEGMENT PARA  'data'
    
    stacksp dw 0
    stackbp dw 0
DATASEG1 ENDS

DATASEG2 SEGMENT PARA 'data'
WHERETOJ:
    ;stck db 32 dup (?)
    txt db 4096 dup (?)  , "$"
    
    ;string B
DATASEG2 ENDS

and my caller is:

mov stacksp, sp
            mov stackbp, bp
            ASSUME DS: DATASEG2, SS:DATASEG2
            mov ax, DATASEG2
            mov DS, AX
            mov ES, AX
            MOV SS, AX
                   
            
            mov SP, 0FFFFh
               call far ptr WHERETOJ ; (Now registers are SP:FFFF, BP:0, SI:0C40, DI:0C16, DS:1820,ES:1820, SS:1820, CS:192D, IP:00BD)

            ASSUME DS: DATASEG1, SS:STACKSEG
            mov ax, STACKSEG
            MOV SS, AX

            MOV AX, DATASEG1
            MOV DS, AX
            MOV ES, AX

               
            mov bx, offset stackbp
            mov bp, [bx]
            mov bx, offset  stacksp
            mov sp, [bx]


            ret

The COM file just runs int 10h to print a character and should return to the caller:

        mov ax, 0945h  ; (Now registers are changed to SP:FFFB, BP:0, SI:0C40, DI:0C16, DS:1820,ES:1820, SS:1820, CS:1820, IP:0000)
        mov bx, 0006
        mov cx, 40
        int 10h
        
        ret ;; ( registers are   SP:FFFB, BP:0, SI:0C40, DI:0C16, DS:1820,ES:1820, SS:1820, CS:1820, IP:000B)

;;ret is ran, the registers are: SP:FFFBD, BP:0, SI:0C40, DI:0C16, DS:1820,ES:1820, SS:1820, CS:1820, IP:00C2

The problem is that when COM file is ran, the ret does not return to the caller but to a wrong unknown IP.


Solution

  • Your primary issue is that the RET you are performing is a NEAR return. A NEAR return in real mode will result in a 16-bit offset being popped off the stack and the IP (Instruction Pointer) being set to that value. The segment will not change.

    Your code:

    call far ptr WHERETOJ
    

    Is a FAR CALL and pushed the 16-bit Code Segment (CS) followed by the 16-bit IP. The NEAR RETURN only popped off the 16-bit IP and left the segment on the stack.

    At the point of the FAR CALL you said the registers had:

    SP:FFFF, SS:1820, CS:192D, IP:00BD

    A CALL pushes the CS followed by IP of the instruction after the CALL. The FAR CALL is encoded as a 5 byte instruction so the address pushed on the stack is 192Dh:00C2h (00BDh+5=00C2h). When you did the NEAR return it didn't change CS but it changed IP to 00C2h. It only popped 2 bytes off the stack as well. This is why you saw this in the debugger when the RET instruction was executed:

    SP:FFFD, SS:1820, CS:1820, IP:00C2

    SP was incremented by 2 from 0FFFBh to 0FFFDh. CS remained the same and IP was set to 00C2h. The CS:IP pair is incorrect so you ended up executing memory you didn't intend to. If you replace RET with RETF (FAR RETURN) then your code would have worked as expected and the registers would have had these values when RET was executed:

    SP:FFFF, SS:1820, CS:192D, IP:00C2


    DOS COM programs

    You use the term COM program, but your code suggests it is likely a binary with an ORG (origin point) of 0000h instead of a typical ORG of 0100h in DOS COM programs. To be compatible with DOS COM you have to load the code and data at an offset 256 bytes from the beginning of the Code Segment the program will be run from. In a DOS COM program, the first instruction executed is CS:0100h and not CS:0000h

    In a typical DOS COM program the DOS loader pushes 0000h on the top of the stack. If you do a NEAR RET that will start executing at CS:0000h. The first 256 bytes contain the DOS Program Segment Prefix (PSP). The first 2 bytes of the PSP (and thus CS:0000h) are an INT 20H instruction.

    enter image description here

    INT 20h will terminate a DOS COM program and return an ERRORLEVEL of 0 to the DOS command prompt that launched the program. INT 20h should not be used to exit DOS EXE programs, you use INT 21h/AH=4C instead.

    If your intention is to find a way to use a NEAR RETURN to exit your program like DOS does then you will have to provide a mechanism (code) inside the code segment to do that. Since you don't have a DOS PSP (or have chosen not to use one) you will have to find a place to copy such code within the segment. The easiest mechanism is to create a code trampoline on the stack before you start executing the program. The simplest way is to push a FAR CALL (or FAR JMP) onto the stack that takes you back to the instruction after the call far ptr WHERETOJ instruction.

    A FAR CALL is encoded on the stack as:

    9A oooo ssss

    Where 9A is the opcode for a FAR CALL, oooo is the offset to call, ssss is the segment to use. We want to keep the stack pointer (SP) on an even alignment1 so we add a NOP instruction for a total of 6 bytes (6 is an even number). The stack would look like:

    90 9A oooo ssss

    Once you have the NOP+FAR CALL built on the stack, you need to push the offset of that code so that a NEAR RET will end up calling it when executed.

    The path of execution in the COM program will be NEAR RET (ret) causing IP to change to the address on the stack where the NOP+FAR CALL is and execute that instruction to call back to the location of the FAR JMP used to start executing the COM program in the first place. You could have encoded a NOP+FAR JMP on the stack as well, but the NOP+FAR CALL has the advantage of pushing the value of CS on the stack which could be useful later on, especially if you load more than one COM program in memory.

    A sample program written to run on 8086 or later processors could look like this:

    .8086
    
    STACKSEG SEGMENT  STACK 'stack'
        DW 512 DUP(?)
    STACKSEG ENDS
    
    DATASEG1 SEGMENT PARA  'data'
        finstr db 0dh, 0ah, 'Returned from COM program', 0dh, 0ah, '$'    
        stacksp dw 0
        stackbp dw 0
    DATASEG1 ENDS
    
    DATASEG2 SEGMENT PARA 'data'
    WHERETOJ:
        mov ax, 0945h
        mov bx, 0057h
        mov cx, 40
        int 10h   
        ret
        org 65536            ; Expand the segment to 64KiB
    DATASEG2 ENDS
    
    CODESEG1 SEGMENT PARA 'code'
    main:
        ASSUME DS: DATASEG1, SS:STACKSEG
        mov [stacksp], SP
        mov [stackbp], BP
    
        ASSUME DS: DATASEG2, SS:DATASEG2
        mov AX, DATASEG2
        mov DS, AX
        mov ES, AX           ; DS=ES=SS=DATASEG2
    
        ; CLI                ; If running on BUGGY 8088 you would need to have CLI/STI
        MOV SS, AX
        xor SP, SP           ; SP = 0. Grow down from top of 64KiB SS segment
        ; STI
    
        ; Build FAR CALL on the COM programs stack
        ; to return to this code when NEAR RET done
        push CS              ; Put CS on stack as part of FAR CALL
        mov AX, offset aftercom
                             ; Push the IP of the instruction after the FAR JMP below
        push AX
        mov AX, 09a90h       ; Put a NOP(90h) on the stack and 9AH (FAR CALL opcode)
        push AX              ; NOP used as padding to keep SP aligned on an even address
    
        mov AX, SP
        push AX              ; Push a copy of SP on the stack. SP is the address of the
                             ;     NOP
                             ;     CALL FAR PTR segment:offset instruction built on stack
    
        jmp far ptr WHERETOJ ; Start executing our program code
    aftercom:
        add SP, 10           ; When we return the stack has 10 bytes on it (6 bytes
                             ;     for FAR CALL and the NOP + 4 bytes of the CALLers
                             ;     IP and CS). Clean them up
    
        ASSUME DS: DATASEG1, SS:STACKSEG
        MOV AX, DATASEG1
        MOV DS, AX
        MOV ES, AX
    
        mov AX, STACKSEG
        ; CLI                ; If running on BUGGY 8088 you would need to have CLI/STI
        MOV SS, AX           ; Restore SS:SP one after another since interrupts
                             ;     will be off until the instruction after changing SS
        mov SP, [stacksp]
        ; STI
        mov BP, [stackbp]
       
        mov AH, 09           ; Display a string saying we returned
        mov DX, offset finstr
        int 21h
    
        mov AX, 4c00h        ; Exit DOS EXE program with ERRORLEVEL 0
        int 21h
    CODESEG1 ENDS
    
    END main
    

    A properly functioning version of the code would look similar to this when run:

    enter image description here


    Footnotes

    • 1In real mode you should always align the stack pointer (SP) to an even offset for performance reasons. Rather than use 0FFFFh as a starting stack address use 0000h instead. The first PUSH that gets done will wrap SP to 0FFFEh (0000h-0002h=0FFFEh) before writing the value on the stack.