Search code examples
clinuxqemuinstructionsptrace

Why does qemu sometimes count more and sometimes less instructions than ptrace?


I want compare registers after each execution of an instruction with the register dumps made by qemu. Therefore i wrote a program that uses ptrace to iterate through each executed instruction of a program and is able to dump the registers after each instruction. I have simplified the program to only work for /bin/ls and instead of dumping the registers it only counts the number of instructions executed.

SPOILER: The instruction counts of qemu and ptrace do not match and differ by a few thousand instructions.

Here is the code i wrote:

#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>    
#include <sys/syscall.h>

int main()
{   
    pid_t child;
    child = fork(); //create child
    
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        char* child_argv[] = {"/bin/ls", NULL};
        execv("/bin/ls", child_argv);
    }
    else {
        int status;
        long long ins_count = 0;
        while(1)
        {
            //stop tracing if child terminated successfully
            wait(&status);
            if(WIFEXITED(status))
                break;

                ins_count++;
                ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
        }

    printf("\n%lld Instructions executed.\n", ins_count);

    }
    
    return 0;
}

Running this piece of code gives me 492611 Instructions executed. I am aware that most of these instructions are from the dynamic linker doing its job. If i dumped the registers after each instruction the first register dump of /bin/ls would be ready.

Now i wanted to dump the registers after each instruction with qemu. I used the following command to singlestep through each instruction coming from /bin/ls and to dump the register states before entering each translation block. I disabled translation block chaining for qemu to dump the registers before each actual instruction.

qemu-x86_64 -singlestep -D logfile -d nochain,cpu /bin/ls

Looking at the logfile, the register dump for each instruction consists of 20 lines, for example:

RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000000
RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000004000805180
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=0000004000807100 RFL=00000202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0033 0000000000000000 ffffffff 00effb00 DPL=3 CS64 [-RA]
SS =002b 0000000000000000 ffffffff 00cff300 DPL=3 DS   [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 0000000000000000 00000000 00000000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     0000004000837000 0000007f
IDT=     0000004000836000 000001ff
CR0=80010001 CR2=0000000000000000 CR3=0000000000000000 CR4=00000220
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000000 CCD=0000000000000000 CCO=EFLAGS  
EFER=0000000000000500

I counted the lines of the logfile with:

wc -l logfile

This gave me 9.873.540 lines which results in register dumps for 9.873.540/20 = 493.677 instructions.

So for /bin/ls qemu counts 1.066 instructions more than my ptrace program. I did the same thing for a "return null" program and for a programm that prints the numbers 0-9. The results are the following:

returnnull:
qemu counts 105.351 instructions vs ptrace counts 109.308 -> qemu counts 3.957 instructions less than ptrace

printf 0-9:
qemu counts 2.188.344 instructions vs ptrace counts 2.194.793 -> qemu counts 6.449 instructions less than ptrace

Why doesnt qemu and ptrace count exactly the same instructions. Why does qemu sometimes count more and sometimes less instructions than ptrace? What can i do to have register dumps of the same instructions and to be able to compare those?


Solution

  • You should be able to answer this yourself if you actually dump the instruction addresses when you single-step with ptrace, do some basic text processing, and run diff -u (be sure to also turn off ASLR, e.g. by running under setarch linux64 -R ...).

    One possibility might be that different code actually gets executed in the startup sequence (either in the dynamic linker or things reached via __libc_start_main or equivalent) due to different entry point state (auxv, etc.) when the program is loaded by the kernel vs by qemu. One quick way to reduce this would be to test with static linking. If that eliminates the difference it's probably the sole cause; if it just changes it then there are probably multiple causes involved.