What instructions does qemu trace?

I wrote the following piece of code that steps through /bin/ls and counts its instructions:

#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>    
#include <sys/syscall.h>

int main()
{   
    pid_t child;
    child = fork(); //create child
    
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        char* child_argv[] = {"/bin/ls", NULL};
        execv("/bin/ls", child_argv);
    }
    else {
        int status;
        long long ins_count = 0;
        while(1)
        {
            //stop tracing if child terminated successfully
            wait(&status);
            if(WIFEXITED(status))
                break;

                ins_count++;
                ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
        }

    printf("\n%lld Instructions executed.\n", ins_count);

    }
    
    return 0;
}

Running this code gives me about 500.000 instructions executed. As far as I know, most of these instructions should be from the dynamic linker. When I trace /bin/ls with QEMU with qemu-x86_64 -singlestep -D log -d in_asm /bin/ls , I get about 17.000 instructions executed. What do I have to adjust to start and stop counting at the same points QEMU does? (aka. counting the same instructions).

I traced a "return null" program with QEMU and it resulted in 7840 instructions while my code gave me 109025, therefore QEMU seems to trace more than the main but less than my code.

My goal is to later compare these instructions, that is why I want to iterate through the same ones like QEMU.

Solution

QEMU's "in_asm" logging is not a log of executed instructions. It logs every time an instruction is translated (ie when QEMU generates a bit of host code corresponding to it). That translation is then cached and if the guest loops around and executes the same instruction again QEMU will simply re-use the same translation, and so it won't be logged by in_asm. "in_asm reports many fewer instructions" is therefore expected.

Logging every executed instruction via the -d options is a bit tricky -- you need to look at the 'cpu' and 'exec' traces, to use the 'nochain' suboption of -d to disable a QEMU optimisation that would otherwise result in some blocks not being logged, to use '-singlestep' to force one instruction per block, and also to account for a few corner cases where we print an execution trace and then don't actually execute the instruction. This is because the -d option is not intended as a way for users to introspect the behaviour of their program -- it is a debug option intended to allow debugging of what QEMU and the guest program are doing together, and so it prints information that requires a little understanding of QEMU internals to interpret correctly.

You might find it simpler to write a QEMU "plugin" instead: https://qemu.readthedocs.io/en/latest/devel/tcg-plugins.html -- this is an API designed to be fairly straightforward to write instrumentation like "count instructions executed". If you're lucky then one of the sample plugins might even be sufficient for your purposes.