Search code examples
assemblyx86-64disassemblyidaaddressing-mode

CS: override on access to global variables in IDA output, like mov eax, cs:x?


I am writing simple programs then analyze them. Today I've written this:

#include <stdio.h>
 
int x;
 
int main(void){
    printf("Enter X:\n");
 
    scanf("%d",&x);
 
    printf("You enter %d...\n",x);
 
    return 0;
}

It's compiled into this:

push    rbp
mov     rbp, rsp
lea     rdi, s          ; "Enter X:"
call    _puts
lea     rsi, x
lea     rdi, aD         ; "%d"
mov     eax, 0
call    ___isoc99_scanf
mov     eax, cs:x   <- don't understand this
mov     esi, eax
lea     rdi, format     ; "You enter %d...\n"
mov     eax, 0
call    _printf
mov     eax, 0
pop     rbp
retn

I don't understand what cs:x means.
I use Ubuntu x64, GCC 10.3.0, and IDA pro 7.6.


Solution

  • TL:DR: IDA confusingly uses cs: to indicate a RIP-relative addressing mode in 64-bit code.


    In IDA mov eax, x means mov eax, DWORD [x] which in turn means reading a DWORD from the variable x.
    For completeness, mov rax, OFFSET x means mov rax, x (i.e. putting the address of x in rax).

    In 64-bit displacements are still 32-bit, so, for a Position Independent Executable, it's not always possible to address a variable by encoding its address (because it's 64-bit and it would not fit into a 32-bit field). And in position-independent code, it's not desirable.
    Instead, RIP-relative addressing is used.

    In NASM, RIP-relative addressing takes the form mov eax, [REL x], in gas it is mov x(%rip), %eax.
    Also, in NASM, if DEFAULT REL is active, the instruction can be shortened to mov eax, [x] which is identical to the 32-bit syntax.

    Each disassembler will disassemble a RIP-relative operand differently. As you commented, Ghidra gives mov eax, DWORD PTR [x].
    IDA uses mov eax, cs:x to mean mov eax, [REL x]/mov x(%rip), %eax.

    ;IDA listing, 64-bit code
    mov eax, x                ;This is mov eax, [x] in NASM and most likely wrong unless your exec is not PIE and always loaded <= 4GiB
    mov eax, cs:x             ;This is mov eax, [REL x] in NASM and idiomatic to 64-bit programs
    

    In short, you can mostly ignore the cs: because that's just the way variables are addressed in 64-bit mode.
    Of course, as the listing above shows, the use or absence of RIP-relative addressing tells you the program can be loaded anywhere or just below the 4GiB.


    The cs prefix shown by IDA threw me off.

    I can see that it could mentally resemble "code" and thus the rip register but I don't think the RIP-relative addressing implies a cs segment override.

    In 32-bit mode, the code segment is usually read-only, so an instruction like mov [cs:x], eax will fault.
    In this scenario, putting a cs: in front of the operand would be wrong.

    In 64-bit mode, segment overrides (other than fs/gs) are ignored (and the read-bit of the code segment is ignored anyway), so the presence of a cs: doesn't really matter because ds and cs are effectively indistinguishable. (Even an ss or ds override doesn't change the #GP or #SS exception for a non-canonical address.)
    Probably the AGU doesn't even read the segment shadow registers anymore for segment bases other than fs or gs. (Although even in 32-bit mode, there's a lower latency fast path for the normal case of segment base = 0, so hardware may just let that do its job.)

    Still cs: is misleading in my opinion - a 2E prefix byte is still possible in machine code as padding. Most tools still call it a CS prefix, although http://ref.x86asm.net/coder64.html calls it a "null prefix" in 64-bit mode. There's no such byte here, and cs: is not an obvious or clear way to imply RIP-relative addressing.