Search code examples
x86virtualization

How is the first instruction of VirtualMachine is fetched (in KVM-QEMU)


I am new both to SO and X86 VMX. I am learning KVM-QEMU on X86, and i want to know the details about how is the first instruction of the VM is fetched, so that the VM can start running. There are KVM APIs to configure and register a set of memory as the physical memory for the VM, then, set the guest_RIP to AAA (for example). I don't know when VMLaunch is called (with proper configured VMCS), how does the CPU could fetch the instruction from that RIP in VMCS, is it through some address translation process, so the guest_CR3 shall be set properly to point to the HOST memory allocated for the guest? Thanks


Solution

  • I will explain this in the context of QEMU and how QEMU operates when the KVM accelerator is enabled.

    As you might be knowing, under kvm, virtual machines are created by opening a device node /dev/kvm. A guest will have its own memory and is usually separate from the userspace process that created it. So basically kvm is structured as a fairly typical Linux character device - you use ioctl()s to create, run, modify parameters, allocate memory and read and write to the VCPU registers of the virtual machines. Thus, the initial setup will be done via various ioctl()s that will setup KVM for further use.

    In terms of the QEMU code, all execution(whether KVM or non-KVM) starts from :

    vl.c start of everything

    The initialization of the KVM architecture happens via the below function -- (collecting CPU flags from CPUID and setting up frequencies etc.)

    kvm_arch_init_vcpu

    Once all the initialization functions are done, the function do_kvm_cpu_synchronize_post_init will try to synchronize the initial values of the VCPU registers based on the host CPU state. It calls another function, kvm_arch_put_registers and sets the VCPU to be dirty. Why is the VCPU set to dirty ? Only then will the subsequent functions actually initialize the values of the VCPU registers.

    This function kvm_arch_put_registers is the key to obtaining all the initial values of the VMCS registers. If you see its body, you will realize what is happening :-

    kvm_arch_put_registers

    Specifically focus on the functions, kvm_getput_regs and kvm_put_sregs - the first function will set up the initial values of the GPRs and the EFLAGS as well as the EIP/RIP register, while the second function will set up the initial segment register values.

    The guest page table will be rooted to the CR3 register. How does this page table work ?

    For this, you need to remember that the mmu in KVM only accounts for one level of virtualization ( guest virtual -> guest physical ) but does not account for the second level ( guest physical -> host physical ). The initial RIP will account for virtual addresses - it will be appropriately translated to physical addresses in the guest. However, to convert this guest physical address to the host physical address, you need to have a separate page table. This is a shadow page table that will be used in conjunction with the original page table (that converts guest virtual -> guest physical) to perform the entire translation.

    There is a need for synchronizing the state of the guest page table with the shadow page table and this sometimes tends to be a problem. Whenever the guest will write to its page table, the corresponding changes need to be performed on the shadow page tables as well.