Search code examples

How exactly DO lines of code relate to each other in assembly code when jump is involved?

Okay, so I understand what mov means, I understand what the registers are, I understand what the operation commands. I even understand that the leftmost hexadecimal is the instruction's number. For example, on line 7, the hexadecimal 7f is instruction jg. FINE.

What I don't get is HOW EXACTLY these facts add up, and its incredibly frustrating.

What I know so far:

Like for example, on line 1 does 0d add to line 804839c? No, it jumps to line 17 because 0d is the instruction AFTER line 1. If you add 0d the address 804839e, you get 80483a7. GOOD.

Does this mean that all instructions for the next line are relative to the second 2 bit hexadecimal?

Does that mean the leftmost hexadecimal is the current line's instruction?

I just need a little more direction, I am so close to figuring this out that I can almost taste it.

1 804839c: 7e 0d      jle   80483ab <silly+0x17>
2 804839e: 89 d0      mov   %edx,%eax
3 80483a0: d1 f8      sar   %eax
4 80483a2: 29 c2      sub   %eax,%edx
5 80483a4: 8d 14 52   lea   (%edx,%edx,2),%edx
6 80483a7: 85 d2      test  %edx,%edx
7 80483a9: 7f f3      jg    804839e <silly+0xa>
8 80483ab: 89 d0      mov   %edx,%eax 


  • If you are confused about the opcode you are a long way from understanding this. You need to start with documentation on the instruction set. For x86 this is plentiful; it's not great documentation, but still the opcodes are pretty clear. With instruction sets like this, it's not hard to find a web page with a chart of opcodes and then you click on that to find the rest of the instruction definition.

    Fairly typical that the relative address is based on the byte after the instruction. If you were working on a team for a brand new processor, then you would just go down to one of the chip folks cubes and ask (since it wouldn't be well documented yet) but since this is an old design there are tools available that will simply give you your answer without asking anyone else.

    Try this:

    a0: jle a0
    a1: jle a1
    a2: jle a2
    a3: jle a3
    a4: jle a4
    b0: jle b1
    b1: jle b2
    b2: jle b3
    b3: jle b4
    b4: jle b5
    b5: nop
    c0: jle c0
    c1: jle c0
    c2: jle c0
    c3: jle c0
    c4: jle c0
    d0: jle d4
    d1: jle d4
    d2: jle d4
    d3: jle d4
    d4: jle d4

    Assemble and disassemble:

    0000000000000000 <a0>:
       0:   7e fe                   jle    0 <a0>
    0000000000000002 <a1>:
       2:   7e fe                   jle    2 <a1>
    0000000000000004 <a2>:
       4:   7e fe                   jle    4 <a2>
    0000000000000006 <a3>:
       6:   7e fe                   jle    6 <a3>
    0000000000000008 <a4>:
       8:   7e fe                   jle    8 <a4>
    000000000000000a <b0>:
       a:   7e 00                   jle    c <b1>
    000000000000000c <b1>:
       c:   7e 00                   jle    e <b2>
    000000000000000e <b2>:
       e:   7e 00                   jle    10 <b3>
    0000000000000010 <b3>:
      10:   7e 00                   jle    12 <b4>
    0000000000000012 <b4>:
      12:   7e 00                   jle    14 <b5>
    0000000000000014 <b5>:
      14:   90                      nop
    0000000000000015 <c0>:
      15:   7e fe                   jle    15 <c0>
    0000000000000017 <c1>:
      17:   7e fc                   jle    15 <c0>
    0000000000000019 <c2>:
      19:   7e fa                   jle    15 <c0>
    000000000000001b <c3>:
      1b:   7e f8                   jle    15 <c0>
    000000000000001d <c4>:
      1d:   7e f6                   jle    15 <c0>
    000000000000001f <d0>:
      1f:   7e 06                   jle    27 <d4>
    0000000000000021 <d1>:
      21:   7e 04                   jle    27 <d4>
    0000000000000023 <d2>:
      23:   7e 02                   jle    27 <d4>
    0000000000000025 <d3>:
      25:   7e 00                   jle    27 <d4>
    0000000000000027 <d4>:
      27:   7e fe                   jle    27 <d4>

    Without having to look at the documentation it looks pretty clear that 0x7E is an opcode and the byte after is a pc relative offset. The 0xFE on the first items implies that it is a signed offset and relative to the byte after the instruction. The remaining experiments confirm that.

    This doesn't mean you should assume that all jump/branch instructions work this way for this instruction set, you can do similar experiments with tools that are known to produce working code.

    This is one area where processor documentation is lacking and you usually need to 1) talk to the silicon engineers if you can 2) look at the chip design (source code) 3) documentation 4) experiment with existing tools 5) experiment with the hardware

    Most folks don't have access to 1 and 2. Often 3 and 4 are available if you actually have one of these processors and usually to get to 5 you have 3 and you probably have access to 4 but sometimes not. But again the documentation often leaves the relative address unknown, usually it is the byte after the instruction, but like in ARM it is a fixed offset from the address of the instruction, the illusion of a specific pipeline.

    804839c: 7e 0d      jle   80483ab <silly+0x17>

    804839c is the address of the jle instruction yes. 80483ab is the address it will branch to if the condition is met. ab-9c = 0xf = 0xD + 2. 2 is the size of the instruction, 0xD is the offset/immediate in the instruction.

    I would assume the other conditional branches of this form (notice the jg later in your code) are an opcode byte and a signed offset byte. But you should always check before making your own assembler or disassembler or simulator. Start with the docs, and confirm with any tools you can find that are known to work for that platform.