May I get an explanation about what opcode sequences are and how to find them in PE32 files?
I am trying to extract them from PE32 files.
what opcode sequences are
A CPU instruction is composed from 1 to multiple bytes, each of of those bytes have different meanings.
An opcode (operation code) is the part of an instruction that defines the behavior of the instruction itself (as in, this instruction is an 'ADD', or an 'XOR', a NOP, etc.).
For x86 / x64 CPUs (IA-32; IA-32e in Intel linguo) an instruction is composed of at least an opcode (1 to 3 bytes), but can comes with multiple other bytes (various prefixes, ModR/M, SIB, Disp. and Imm.) depending on its encoding:
Opcode is often synonym with "instruction" (since the opcode defines the behavior of the instruction); therefore when you have multiple instructions you then have an opcode sequence (which is a bit of a misnomer since it's really a instruction sequence, unless all instructions in the sequence are only composed of opcodes).
how to find them in PE32 files?
As instructions can be multiple bytes long, you can't just start at a random location in the .text section (which, for a PE file, contains the executable code of the program). There's a specific location in the PE file - called the "entry point" - which defines the start of the program.
The entry point for a PE File is given by the AddressOfEntryPoint
member of the IMAGE_OPTIONAL_HEADER
structure (parts of the PE header structures). Note that this member is an RVA
, not a "full" VA.
From there you know you are at the start of an instruction. You can start disassembling / counting instructions from this point, following the encoding rules for instructions (these rules are explained to great length in the Intel and AMD manuals).
Most instruction are "fall-through", which means that once an instruction has executed, the next to execute is the following one (this seems obvious, but!). The trick is when there's a non-fall-through instruction, you must know what this instruction does to continue your disassembling (e.g. it might jump somewhere, go to a specific handler, etc.)