Search code examples
linuxdebugginglibrariesqemuinstructions

Determining the source of Qemu guest instructions when using in_asm


I'm trying to gather statistics about the percentage of library code that is used vs executed. To do this I'm invoking Qemu-user with the -d in_asm flag. I log this to a file and get a sizeable file listing the translated instructions that looks like this

----------------
IN:
0x4001a0f1e9:  48 83 c4 30              addq     $0x30, %rsp
0x4001a0f1ed:  85 c0                    testl    %eax, %eax
0x4001a0f1ef:  74 b7                    je       0x4001a0f1a8

----------------
IN:
0x4001a0f1f1:  49 8b 0c 24              movq     (%r12), %rcx
0x4001a0f1f5:  48 83 7c 24 50 00        cmpq     $0, 0x50(%rsp)
0x4001a0f1fb:  0f 84 37 01 00 00        je       0x4001a0f338

----------------

To map blocks to associated files, I extract the /proc/pid/maps for the qemu process and compare the address of instructions executed to the address ranges of files within the guest program. This appears to work reasonably well, however the majority of the instructions executed appear to be outside of any of the files contained within the map file. The bottom of the guest address space is listed as follows

.
.
.
40020a0000-4002111000 r--p 00000000 103:02 2622381                       /lib/x86_64-
linux-gnu/libpcre.so.3.13.3
4002111000-4002112000 r--p 00070000 103:02 2622381                       /lib/x86_64-linux-gnu/libpcre.so.3.13.3
4002112000-4002113000 rw-p 00071000 103:02 2622381                       /lib/x86_64-linux-gnu/libpcre.so.3.13.3
4002113000-4002115000 rw-p 00000000 00:00 0
555555554000-5555555a1000 r--p 00000000 103:02 12462104                  /home/name/Downloads/qemu-5.2.0/exe/bin/qemu-x86_64

the guest program appears to end at 0x4002115000, with a sizeable gap between the guest, and Qemu which begins at 0x555555554000. I can match instructions in the libraries to the actual binaries, so the approach isn't entirely faulty. However there are almost 60,000 blocks executed whose origin is between 0x400aa20000 and 0x407c8ae138. This region of memory is nominally unmapped, however Qemu seems to be translating, and succesfully executing code here. The program appears to run correctly, so I am unsure where these instructions originate. I had initially thought it might be the vDSO, but the range appears to be much too large, and there are too many separate addresses. I looked at the preceding code for a couple of these blocks and it was in ld.so but I can't say if all the calls are generated there. I think it's possible that this is kernel code, but I'm not sure how to validate whether or not this is true. I'm at a loss as to how to approach this problem.

Is there a way to trace the providence of these instructions? perhaps using the gdb stub or some other logging functionality?"


Solution

  • When you are searching in /proc/pid/maps the corresponding modules may be already unloaded. Running LD_DEBUG=files <your qemu command line> will print module loading info, including their load address and size. Search there for missing code addresses.