Search code examples
cassemblyx86-16tasmturbo-c++

How to convert Turbo-C far-pointer assignment and dereference to x86 assembly?


I'm going to have assembly testing and I have a question about assembly pointers. I'm trying to do an exercise but I can not solve it.

Consider the statements in C:

int x=100, y=200;
int far *ptx;
int far *pty;

assuming that the instructions have already been executed:

ptx=&x;
pty=(int *)malloc(sizeof(int));   

my question is on how to code the following points in assembly:

  1. ptx=pty
  2. *ptx=*pty

Solution

  • Are those declarations supposed to be at global scope? If so, there will be asm labels on the static storage for the C variables. If not (locals inside a function), they'll be on the stack and IDK how they expect you to know what offset from BP they'll be at.

    Either way, they're 32-bit seg:off (little-endian so offset in the low 16 bits) far pointers, so copying one to another is just a 4-byte copy you can do with 2 integer loads + stores.

    Pointer variables (when they don't optimize away or into register) store the pointer value itself in memory, just like an int or long. In C when you do *pty, the compiler has to load the pointer value into registers, then do another load of the pointed-to memory.


    I'm going to assume that DS refers to the data segment where the pointer values themselves are stored in memory. And that sizeof(int)=2, because that seems likely for a 16-bit C implementation.

    To dereference and load the memory pointed-to by pty, i.e. *pty, you need to load the segment part of the part pointer into a segment register, and the offset part into SI,DI, or BX (registers that can be used as part of an addressing mode). x86 has instructions for that, like les / lds.

    Since we probably don't want to modify DS, I'll just use ES. (Different assemblers use different syntax for segment overrides, like [es: di] for NASM but I think maybe es:[di] for TASM.)

    ;; *ptx = *pty
    ;; clobbers: ES, DI, and AX
    ; load *pty
        les  di, [pty]        ; load pty  from [DS:pty] into ES:DI
        mov  ax, es:[di]      ; load *pty into AX
    
    ; store *ptx
        les  di, [ptx]        ; load ptx  from [DS:ptx] into ES:DI
        stosw                 ; store to *ptx from AX
    

    STOSW stores AX to ES:DI and increments or decrements DI according to the direction flag, DF. We don't care about the value of DI after this instruction runs, but the standard calling convention for Turbo C++ (and modern x86 conventions) says DF=0 (increment upward) on function entry/exit.

    Use plain mov with another segment override if you haven't learned about string instructions yet.

    (@MichaelPetch says DS is normally call-preserved in 16-bit real mode calling conventions, but that ES can be freely clobbered without saving/restoring it, so apparently I guessed right.)


    Or if you can clobber DS and ES, you can use MOVSW. Using push/pop ds around this to save/restore would be more instructions. (But still smaller code-size)

    ;; assuming DS is correct for referencing static data like [pty]
        les  di, [pty]        ; load pty  from [DS:pty] into ES:DI
        lds  si, [ptx]        ; load ptx  from [DS:ptx] into DS:SI
        movsw                 ; copy a word from [DS:SI] to [ES:DI]
    

    Note that I used lds second, because I'm assuming both globals in static storage are accessible through the incoming value of DS, not whatever segment value is part of the other far pointer.

    If you had a "huge" or "large" memory model (or other model where not all static data is known to fit in one 64k segment), this would be more complicated, but your question didn't show anything about where ptx and pty are actually stored.


    Also, I'm assuming you aren't supposed to optimize them away based on how they were recently assigned, even though the question shows you what they point to.

    If you know ptx = &x, then you don't need to load ptx from memory, you can just mov [x], ax (again assuming a code model where static data like x is reachable via DS).

    Also, it makes little sense to read from *pty when it's pointing at freshly-malloced storage, because that's uninitialized. The other way would make sense. I'm probably over-analyzing it.