Search code examples
cassemblycompiler-constructionx86

Why is object file broken into grid of 8 columns of 4 hexidecimal numbers each?


After asking about the relation between assembly and machine code, I am beginning to read through the Intel 64 instruction set reference.

There is still a lot to learn here, but after looking through the first two chapters (need to study chapter 2 much more), I don't feel any closer to understanding what the machine code means yet. Maybe after reading all 1300+ pages, and the Art of Assembly, and perhaps a CS architecture course, how this applies in practice will start to make sense.

But in the mean time, can you explain why the numbers in a compiled assembly file (or any "binary" I guess is what you'd call it, which is just machine code in my understanding) is organized into a grid of 8 columns with 4 hexidecimal numbers each? This may be obvious to you but I have no idea if it means anything or not.

cffa edfe 0700 0001 0300 0000 0100 0000
0200 0000 0001 0000 0000 0000 0000 0000
1900 0000 e800 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
2e00 0000 0000 0000 2001 0000 0000 0000
2e00 0000 0000 0000 0700 0000 0700 0000
0200 0000 0000 0000 5f5f 7465 7874 0000
0000 0000 0000 0000 5f5f 5445 5854 0000
0000 0000 0000 0000 0000 0000 0000 0000
2000 0000 0000 0000 2001 0000 0000 0000
5001 0000 0100 0000 0005 0080 0000 0000
0000 0000 0000 0000 5f5f 6461 7461 0000
0000 0000 0000 0000 5f5f 4441 5441 0000
0000 0000 0000 0000 2000 0000 0000 0000
0e00 0000 0000 0000 4001 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0200 0000 1800 0000
5801 0000 0400 0000 9801 0000 1c00 0000
e800 0000 00b8 0400 0002 bf01 0000 0048
be00 0000 0000 0000 00ba 0e00 0000 0f05
4865 6c6c 6f2c 2077 6f72 6c64 210a 0000
1100 0000 0100 000e 0700 0000 0e01 0000
0500 0000 0000 0000 0d00 0000 0e02 0000
2000 0000 0000 0000 1500 0000 0200 0000
0e00 0000 0000 0000 0100 0000 0f01 0000
0000 0000 0000 0000 0073 7461 7274 0077
7269 7465 006d 6573 7361 6765 006c 656e
6774 6800

More specifically...

As pointed out in the selected answer in the other question about the relation between assembly and machine code, all the information is at least somewhere in the Intel docs. For example, at the beginning of Chapter 2, they say these things:

  • LOCK prefix is encoded using F0H.
  • REPNE/REPNZ prefix is encoded using F2H...

The LOCK prefix (F0H) forces an operation that ensures exclusive use of shared memory in a multiprocessor environment... Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a string...

I understand that by F0H, they really just mean "f0 which is a hexidecimal number in case that isn't clear". So then you can find that number a couple of times in the machine code above. For example, near the bottom in the 6th column is bf01.

Without knowing much more than this, I am trying to put together the very specific (but not very practical) intel docs with some actual machine code, so I can start to really "get" how the intel docs are actually applied.

As a first step in that process of understanding, I am wondering this:

  1. Is the f0 in that bf01 the same thing that the intel docs are describing? That is, is it the LOCK prefix F0H? Or if not, how do you know that?
  2. Why are the numbers in a grid of 8 columns of 4 numbers each?
  3. If f0 in the bf01 chunk does mean that LOCK prefix, why is it starting at an odd position (that is, it's not starting at an even position like position 0 or 2 in a column)? This is the main reason for this whole question. If it can appear at an odd position, then is breaking them into 8 columns of 4 numbers each just arbitrary (i.e. just makes it look pretty), because if all opcodes are at least 2 characters, then it would never appear at an odd position.

Solution

  • Why are the numbers in a grid of 8 columns of 4 numbers each?

    This is how you, or the tool you're using, is choosing to display them. I personally would display individual bytes rather than two-byte words. I would choose the number of columns depending on how I am going to display or print out the hex dump.

    The best to study hex dumps of machine code is using a disassembler. There is an online one here. For example, it disassembles the following hex dump

    55 31 D2 89 E5 8B 45 08 56 8B 75 0C 53 8D 58 FF 
    0F B6 0C 16 88 4C 13 01 83 C2 01 84 C9 75 F1 5B
    5E 5D C3
    

    to

        .data:0x00000000    55          push   ebp  
        .data:0x00000001    31d2        xor    edx,edx  
        .data:0x00000003    89e5        mov    ebp,esp  
        .data:0x00000005    8b4508      mov    eax,DWORD PTR [ebp+0x8]  
        .data:0x00000008    56          push   esi  
        .data:0x00000009    8b750c      mov    esi,DWORD PTR [ebp+0xc]  
        .data:0x0000000c    53          push   ebx  
        .data:0x0000000d    8d58ff      lea    ebx,[eax-0x1]    
        .data:0x00000010            
        .data:0x00000010        loc_00000010:   
    ┏▶  .data:0x00000010    0fb60c16    movzx  ecx,BYTE PTR [esi+edx*1] 
    ┃   .data:0x00000014    884c1301    mov    BYTE PTR [ebx+edx*1+0x1],cl  
    ┃   .data:0x00000018    83c201      add    edx,0x1  
    ┃   .data:0x0000001b    84c9        test   cl,cl    
    ┗   .data:0x0000001d    75f1        jne    loc_00000010 
        .data:0x0000001f    5b          pop    ebx  
        .data:0x00000020    5e          pop    esi  
        .data:0x00000021    5d          pop    ebp