Search code examples
c++assemblyx86emu8086

How to handle forward referencing when Label size is not fixed


I am trying to write 8086 Emulator in C++. But I am facing a problem.

Suppose the code is:

MOV AL, BL
JMP X
MOV BL, CL
MOV DL, CL
.
.
.
X:
ADD AX, BX
HLT

Now the machine code for JMP X will depend on X, whether it is near location or short location

near: 8Bit Address[00-ff]

short: 16Bit Address(ff-ffff]

So if the size of the JMP instruction used to be constant(fixed size)than I can just move further and whenever I will find X I can just put it's address back. But here I can't move further because next location is also depending on JMP X and whose size is not fixed.

I have no idea how to deal with it.


Solution

  • You may have even more problems for jmp. See the following possible opcodes and their meaning:

    EB cb   JMP rel8        Jump short, relative, displacement relative to next instruction.
    E9 cw   JMP rel16       Jump near, relative, displacement relative to next instruction.
    E9 cd   JMP rel32       Jump near, relative, displacement relative to next instruction.
    FF /4   JMP r/m16       Jump near, absolute indirect, address given in r/m16.
    FF /4   JMP r/m32       Jump near, absolute indirect, address given in r/m32.
    EA cd   JMP ptr16:16    Jump far, absolute, address given in operand.
    EA cp   JMP ptr16:32    Jump far, absolute, address given in operand.
    FF /5   JMP m16:16      Jump far, absolute indirect, address given in m16:16.
    FF /5   JMP m16:32      Jump far, absolute indirect, address given in m16:32.
    

    So, you need to consider more special cases.

    The solution is to implement a multipass assembler. You need to store anyway all the opcodes and operands in a std::vector or wherever. Then you can set the correct data in the 2nd step.

    If you define a struct for the opcodes and operands and store all these structs in a std::vector, it will not have an influence on the other opcodes/operands. You can also run multiple passes, until everything is correct.

    And then, when everything is fixed, you can go over the std::vector again and emit the real needed data.

    You may come up with:

    struct Operation {
        bool updateNeeded{false};
        unsigned int opcode{};
        unsigned long operand1{};
        unsigned long operand2{};
        unsigned long operand3{};
        size_t indexOfRelated{};
    };
    
    std::vector<Operation> operation;
    

    Of course you can add more attributes as needed.

    Then you can read the source data and anf fill the std::vector. After having read the complete source code, you will go over the data again, and fix the open issues.

    Then, hand this over to the virtual machine, or emit the final instructions.