Search code examples
linuxmemory-managementx86-64virtualization

What is the relation between EPT PTE and host PTE entry?


I am trying to figure out the relation between the EPT PTEs and host PTEs in the Linux system with virtualization in X86 host.
For example, when the hypervisor setup an EPT entry by providing a host memory page, what will happen when the guest writes that page in guest?
In above case, the EPT entry is 'dirty', is the host PTE entry to that host page is still dirty or not?

I wrote a simple hypervisor for Linux, which supports EPT. I found when the guest writes a page, it is dirty bit is set in EPT entry, but by checking the host PTE entry, I did NOT find the dirty bit set.

In the EPT violation handler, I call kmalloc to get a host page for guest. Then I use following code to check the host PTE entry for that page.

void pgtable_walk(unsigned long addr)
{
    pgd_t *pgd;
    pud_t *pud;
    pmd_t *pmd;
    pte_t *pte;
    pte_t  cpte;
    unsigned long page_mask;
    unsigned int level;
    phys_addr_t phys_addr;
    unsigned long offset;

    pgd = pgd_offset(current->mm, addr);
    printk(KERN_ALERT "pgd is : %lx\n", (unsigned long)pgd->pgd);
    printk(KERN_ALERT "pgd index: %lx\n", (unsigned long)pgd_index(addr));
    pud = pud_offset(pgd, addr);
    printk(KERN_ALERT "pud is : %lx\n", (unsigned long)pud->pud);
    printk(KERN_ALERT "pud index: %lx\n", (unsigned long)pud_index(addr));
    pmd = pmd_offset(pud, addr);
    printk(KERN_ALERT "pmd is : %lx\n", (unsigned long)pmd->pmd);
    printk(KERN_ALERT "pmd index: %lx\n", (unsigned long)pmd_index(addr));
    if(!pmd_large(*pmd)) {
        pte = pte_offset_kernel(pmd, addr);
        printk(KERN_ALERT "pte is : %lx\n", (unsigned long)pte->pte);
        printk(KERN_ALERT "pte index: %lx\n", (unsigned long)pte_index(addr));
        level = 2;
    } else {
        pte = (pte_t *)pmd;
        level = 1;
    }
    page_mask = page_level_mask(level);
    phys_addr = pte_pfn(*pte) << PAGE_SHIFT;
    offset    = addr & ~page_mask;

    printk("Final Phys Addr: %lx, dirty=%x, pte=%lx\n",
            (unsigned long)(phys_addr | offset), pte_dirty(*pte), pte_val(*pte));
}

If so, how does Linux knows which page is dirty or not?


Solution

  • The processor can only set the dirty bit in the PTEs that are used to translate the virtual address when the write is performed. So when a guest writes to a page, the processor sets the dirty bits in the guest PTE and in the EPT.* At the time a write happens in a guest, the processor doesn’t have a pointer to the host page tables, nor does it know whether the page is even mapped in any host page tables. So if software in the host wants to find out if the page is dirty, it must look at the EPT.

    * The EPT dirty bit is set only if the optional EPT A/D feature is available and is enabled by setting bit 6 in the EPTP field in the VMCS. (See section 28.2.5 of the Intel SDM.)