After asking about the relation between assembly and machine code, I am beginning to read through the Intel 64 instruction set reference.
There is still a lot to learn here, but after looking through the first two chapters (need to study chapter 2 much more), I don't feel any closer to understanding what the machine code means yet. Maybe after reading all 1300+ pages, and the Art of Assembly, and perhaps a CS architecture course, how this applies in practice will start to make sense.
But in the mean time, can you explain why the numbers in a compiled assembly file (or any "binary" I guess is what you'd call it, which is just machine code in my understanding) is organized into a grid of 8 columns with 4 hexidecimal numbers each? This may be obvious to you but I have no idea if it means anything or not.
cffa edfe 0700 0001 0300 0000 0100 0000
0200 0000 0001 0000 0000 0000 0000 0000
1900 0000 e800 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
2e00 0000 0000 0000 2001 0000 0000 0000
2e00 0000 0000 0000 0700 0000 0700 0000
0200 0000 0000 0000 5f5f 7465 7874 0000
0000 0000 0000 0000 5f5f 5445 5854 0000
0000 0000 0000 0000 0000 0000 0000 0000
2000 0000 0000 0000 2001 0000 0000 0000
5001 0000 0100 0000 0005 0080 0000 0000
0000 0000 0000 0000 5f5f 6461 7461 0000
0000 0000 0000 0000 5f5f 4441 5441 0000
0000 0000 0000 0000 2000 0000 0000 0000
0e00 0000 0000 0000 4001 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0200 0000 1800 0000
5801 0000 0400 0000 9801 0000 1c00 0000
e800 0000 00b8 0400 0002 bf01 0000 0048
be00 0000 0000 0000 00ba 0e00 0000 0f05
4865 6c6c 6f2c 2077 6f72 6c64 210a 0000
1100 0000 0100 000e 0700 0000 0e01 0000
0500 0000 0000 0000 0d00 0000 0e02 0000
2000 0000 0000 0000 1500 0000 0200 0000
0e00 0000 0000 0000 0100 0000 0f01 0000
0000 0000 0000 0000 0073 7461 7274 0077
7269 7465 006d 6573 7361 6765 006c 656e
6774 6800
More specifically...
As pointed out in the selected answer in the other question about the relation between assembly and machine code, all the information is at least somewhere in the Intel docs. For example, at the beginning of Chapter 2, they say these things:
- LOCK prefix is encoded using F0H.
- REPNE/REPNZ prefix is encoded using F2H...
The LOCK prefix (F0H) forces an operation that ensures exclusive use of shared memory in a multiprocessor environment... Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a string...
I understand that by F0H
, they really just mean "f0
which is a hexidecimal number in case that isn't clear". So then you can find that number a couple of times in the machine code above. For example, near the bottom in the 6th column is bf01
.
Without knowing much more than this, I am trying to put together the very specific (but not very practical) intel docs with some actual machine code, so I can start to really "get" how the intel docs are actually applied.
As a first step in that process of understanding, I am wondering this:
f0
in that bf01
the same thing that the intel docs are describing? That is, is it the LOCK prefix F0H
? Or if not, how do you know that?f0
in the bf01
chunk does mean that LOCK prefix, why is it starting at an odd position (that is, it's not starting at an even position like position 0 or 2 in a column)? This is the main reason for this whole question. If it can appear at an odd position, then is breaking them into 8 columns of 4 numbers each just arbitrary (i.e. just makes it look pretty), because if all opcodes are at least 2 characters, then it would never appear at an odd position.Why are the numbers in a grid of 8 columns of 4 numbers each?
This is how you, or the tool you're using, is choosing to display them. I personally would display individual bytes rather than two-byte words. I would choose the number of columns depending on how I am going to display or print out the hex dump.
The best to study hex dumps of machine code is using a disassembler. There is an online one here. For example, it disassembles the following hex dump
55 31 D2 89 E5 8B 45 08 56 8B 75 0C 53 8D 58 FF
0F B6 0C 16 88 4C 13 01 83 C2 01 84 C9 75 F1 5B
5E 5D C3
to
.data:0x00000000 55 push ebp
.data:0x00000001 31d2 xor edx,edx
.data:0x00000003 89e5 mov ebp,esp
.data:0x00000005 8b4508 mov eax,DWORD PTR [ebp+0x8]
.data:0x00000008 56 push esi
.data:0x00000009 8b750c mov esi,DWORD PTR [ebp+0xc]
.data:0x0000000c 53 push ebx
.data:0x0000000d 8d58ff lea ebx,[eax-0x1]
.data:0x00000010
.data:0x00000010 loc_00000010:
┏▶ .data:0x00000010 0fb60c16 movzx ecx,BYTE PTR [esi+edx*1]
┃ .data:0x00000014 884c1301 mov BYTE PTR [ebx+edx*1+0x1],cl
┃ .data:0x00000018 83c201 add edx,0x1
┃ .data:0x0000001b 84c9 test cl,cl
┗ .data:0x0000001d 75f1 jne loc_00000010
.data:0x0000001f 5b pop ebx
.data:0x00000020 5e pop esi
.data:0x00000021 5d pop ebp