Search code examples
assemblyx86nasm32-bit

Is pop eip legal instruction?


I’m working on this theoretical test I got at university, and have been asked this question : After some instruction, esp grew by 4 and eip grew by 20, what can possibly be the instruction? I marked both “pop eip” and “ret”. Is it possible in nasm 32 bit assembly to execute pop eip instruction?


Solution

  • pop eip is not a real x86 instruction. No assembler will assemble it, AFAIK.

    It's pseudocode for explaining what ret does. See the Operation section in the manual. Specifically a normal "near" ret; far jmp/call/ret are basically unused in "normal" 32-bit code.

    ret has its own opcode separate from any of the encodings for pop, and x86 also chooses to give it a separate mnemonic. It would have been a valid design for pop eip to also be accepted as another name for the 0xc3 opcode. x86 does have the mov mnemonic overloaded for many different opcodes, including mov to/from control registers, mov to/from debug registers, as well as the standard mov between integer registers and/or memory or immediate. (That "standard" form of mov also has several different opcodes to choose from, though.)

    But that would have been a bit weird because push eip doesn't exist except as call +0 which has the performance side-effect of being a jump.

    EIP is not one of the 8 general-purpose integer registers, so the normal encodings of pop can't encode a ret. That's one reason why ret needs its own opcode, and why it makes sense for it to have a separate mnemonic in asm source. x86 instructions encode registers as 3-bit numbers, or optionally 4-bit on x86-64. Or as an implicit source or destination, like EDX:EAX for mul or div, or pushf implicitly reads EFLAGS: it's just implied by that opcode without any bits that specifically mean EFLAGS.


    ret isn't magic: all it does is pop the stack and use the result as a jump target. It's up to the programmer to make sure ESP is pointing at an address you want to jump to, typically a return address.

    Some beginners fail to understand this and think that ret will magically return to the last call, so they don't make the connection between faulting on ret and their code messing up the stack.

    I've definitely written something like "ret is the name we use on x86 for pop eip" in SO answers and comments many times.

    Fun fact: on ARM 32-bit, the program counter is one of the 16 integer registers, r15, so you really can pop {r4, pc} to restore a saved R4 and pop a saved lr (link register = return address) into the program counter all in one instruction. So ARM literally can do the equivalent of pop eip with the same opcode it uses for popping general-purpose integer registers.


    esp grew by 4 and eip grew by 20

    Yes, I think C3 ret or C2 00 00 ret 0 are the only 2 opcodes that could do this, and both use the ret mnemonic.

    If EIP grew by 15 or fewer, a long encoding of add esp, 4 or pop eax could account for it, e.g. with multiple redundant rep and/or fs prefixes and an imm32 encoding for the immediate 4.

    x86 instructions can be at most 15 bytes long; if decoding doesn't reach the end of an instruction before 15 bytes, the CPU takes a #UD exception, just like for other illegal instructions. So changing EIP by 20 bytes with one instruction is only possible with a jump. And the only jump that increases ESP is ret; jmp / jcc leave it unmodified, call pushes a return address.

    iret is almost possible but it pops CS:IP, a FLAGS value, and a new SS:SP: You can't get it to pop just 4 bytes. (Especially in 32-bit mode.)

    sysret doesn't modify ESP, and is only usable by the kernel (ring 0). sysexit sets RSP from RCX and RIP = RDX, but I'm pretty sure that's not an answer they were looking for. :P

    16-bit retf could work in a very specific case: if EIP was in the low 16 bits of 32-bit address-space, and the current CS value was already on the stack. 16-bit jumps, including retf, truncate EIP to IP.

    I'm not 100% sure a 66h operand-size override prefix for retf in 32-bit mode will get it to only pop 4 bytes not 6; the pseudocode in https://www.felixcloutier.com/x86/ret says:

       // retf in protected mode, not vm86, RETURN-TO-SAME-PRIVILEGE-LEVEL:
                ELSE (* OperandSize = 16 *)
                        EIP := Pop();
                        EIP := EIP AND 0000FFFFH;
                        CS := Pop(); (* 16-bit pop *)
    

    That first Pop() is being assigned to a 32-bit register; does that make it a 32-bit pop? But this part of the pseudo-code would apply even in 16-bit protected mode.