Search code examples
x86-16processormemory-segmentation

Can someone help me with segmentation and 8086 intel's microprocessor?


I am reading about the architecture of intel's 8086 and can't figure out the following things about segmentation: I know that segment registers point to segments respectively and contain the base address of a 64kb long segment. But who calculates and in which point sets the physical address in the segment registers? Also, because one physical address can be accessed by multiple segment:offset pairs and segments can overlap, how you can be sure that you won't overwrite something? Where I can read more about this?


Solution

  • Generally speaking the Assembler will only use offset addresses to access a logical address. For example looking at this code:

    start   lea si,[hello]          ; Load effective address of string
            mov word [ds:si+10],0   ; Zero-terminate string after 10th letter
            jmp $                    ; Loop endlessly
    
    ; Fill rest of the segment with 0s
    times 65536-($-$$) db 0x00
    
    hello   db "I'm just outside of the current segment. Hello!",0
    

    The assembler will try to calculate the offset of 'hello' from the origin of the program. Since no origin is defined 0x0 will be assumed. However the offset of 'hello' would be 0x10000 in this case, which does not fit 16-bits. Therefor the Assembler will truncate the address to 0x0000. It will not change any of the Segment registers. However it will likely issue a warning, for example test.asm:1: warning: word data exceeds bounds. What actually happens when you run this program is that the jmp $ line is overwritten with zeroes, because the address of hello wrapped around and the CPU will start executing nothing but Zeroes, which was not what you intended to do.

    That is of course only if the code-segment and data-segment are the same. Now who guarantees that is the case? Nobody really. Especially since I still don't know what platform you are coding for. It is entirely your resposibility to set up the segment registers with correct values. The easiest way to do so is:

    push cs   ; Push address of code segment to stack
    pop ds    ; Pop address back into data segment
    push cs   ; Same for extra data segment
    pop es    ;
    

    This way you can be certain your you are accessing the offset in the correct-data segment.

    Now regarding 'How do you make sure the code segment doesnt overlap the data segment', why shouldn't it? When your program with data is smaller than 64KB it is actually the easiest way to access data if your code and data segment are identical.

    And how can you be sure that you don't overwrite anything important? Assembler can't help you with that, you have to check yourself if the segment:offset address you are writing to already contains data.