I am writing simple programs then analyze them. Today I've written this:
#include <stdio.h>
int x;
int main(void){
printf("Enter X:\n");
scanf("%d",&x);
printf("You enter %d...\n",x);
return 0;
}
It's compiled into this:
push rbp
mov rbp, rsp
lea rdi, s ; "Enter X:"
call _puts
lea rsi, x
lea rdi, aD ; "%d"
mov eax, 0
call ___isoc99_scanf
mov eax, cs:x <- don't understand this
mov esi, eax
lea rdi, format ; "You enter %d...\n"
mov eax, 0
call _printf
mov eax, 0
pop rbp
retn
I don't understand what cs:x
means.
I use Ubuntu x64, GCC 10.3.0, and IDA pro 7.6.
TL:DR: IDA confusingly uses cs:
to indicate a RIP-relative addressing mode in 64-bit code.
In IDA mov eax, x
means mov eax, DWORD [x]
which in turn means reading a DWORD from the variable x
.
For completeness, mov rax, OFFSET x
means mov rax, x
(i.e. putting the address of x
in rax
).
In 64-bit displacements are still 32-bit, so, for a Position Independent Executable, it's not always possible to address a variable by encoding its address (because it's 64-bit and it would not fit into a 32-bit field). And in position-independent code, it's not desirable.
Instead, RIP-relative addressing is used.
In NASM, RIP-relative addressing takes the form mov eax, [REL x]
, in gas it is mov x(%rip), %eax
.
Also, in NASM, if DEFAULT REL
is active, the instruction can be shortened to mov eax, [x]
which is identical to the 32-bit syntax.
Each disassembler will disassemble a RIP-relative operand differently. As you commented, Ghidra gives mov eax, DWORD PTR [x]
.
IDA uses mov eax, cs:x
to mean mov eax, [REL x]
/mov x(%rip), %eax
.
;IDA listing, 64-bit code
mov eax, x ;This is mov eax, [x] in NASM and most likely wrong unless your exec is not PIE and always loaded <= 4GiB
mov eax, cs:x ;This is mov eax, [REL x] in NASM and idiomatic to 64-bit programs
In short, you can mostly ignore the cs:
because that's just the way variables are addressed in 64-bit mode.
Of course, as the listing above shows, the use or absence of RIP-relative addressing tells you the program can be loaded anywhere or just below the 4GiB.
The cs
prefix shown by IDA threw me off.
I can see that it could mentally resemble "code" and thus the rip
register but I don't think the RIP-relative addressing implies a cs
segment override.
In 32-bit mode, the code segment is usually read-only, so an instruction like mov [cs:x], eax
will fault.
In this scenario, putting a cs:
in front of the operand would be wrong.
In 64-bit mode, segment overrides (other than fs
/gs
) are ignored (and the read-bit of the code segment is ignored anyway), so the presence of a cs:
doesn't really matter because ds
and cs
are effectively indistinguishable. (Even an ss
or ds
override doesn't change the #GP or #SS exception for a non-canonical address.)
Probably the AGU doesn't even read the segment shadow registers anymore for segment bases other than fs
or gs
. (Although even in 32-bit mode, there's a lower latency fast path for the normal case of segment base = 0, so hardware may just let that do its job.)
Still cs:
is misleading in my opinion - a 2E
prefix byte is still possible in machine code as padding. Most tools still call it a CS prefix, although http://ref.x86asm.net/coder64.html calls it a "null prefix" in 64-bit mode. There's no such byte here, and cs:
is not an obvious or clear way to imply RIP-relative addressing.