Search code examples
linuxlinux-kernelx86virtual-machinevirtualization

Linux Kernel Module: Setting CR4.VMXE does not persist


I'm playing around with VMX on XUbuntu 16.04, but I'm running into some issues with setting the VMXE bit of CR4. The issue is that by the time my exit function is called, the bit is no longer set.

vmmod.c

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/types.h>

#define AUTHOR "me"
#define DESC "Test"

extern u64 read_cr4(void);
extern void write_cr4(u64 val);

static bool IsVMXEEnabled(void)
{
    return (read_cr4() >> 13) & 1;
}

static void SetVMXEEnabled(void* _val)
{
    bool val = *(bool*)_val;
    u64 mask = (1 << 13);
    u64 cr4 = read_cr4();

    if (val)
        cr4 |= mask;
    else
        cr4 &= (~mask);

    write_cr4(cr4);
}

static void LogVMXEState(void* info)
{
    (void) info;
    printk(KERN_INFO "CR4: %08LX\n", read_cr4());
}

static int __init init_(void)
{
    printk(KERN_INFO "===================================\n");

    if (IsVMXEEnabled())
        printk(KERN_INFO "VMXE Is Enabled\n");
    else
    {
        bool new_vmxe_state = true;
        printk(KERN_INFO "Enabling VMXE\n");
        on_each_cpu(SetVMXEEnabled, &new_vmxe_state, 1);

        if (IsVMXEEnabled())
        {
            printk(KERN_INFO "VMXE Has Been Enabled\n");
            on_each_cpu(LogVMXEState, NULL, 1);
        }
        else
        {
            printk(KERN_INFO "VMXE Could Not Be Enabled\n");
            return -1;
        }
    }
    return 0;
}

static void __exit exit_(void)
{
    printk(KERN_INFO "----------------------------------------\n");

    on_each_cpu(LogVMXEState, NULL, 1);
    if (IsVMXEEnabled())
    {
        bool new_val = false;
        printk(KERN_INFO "Disabling VMXE\n");
        on_each_cpu(SetVMXEEnabled, &new_val, 1);

        if (!IsVMXEEnabled())
            printk(KERN_INFO "VMXE Has Been Disabled\n");
        else
            printk(KERN_INFO "Couldn't disabled VMXE...\n");
    }
    else
        printk(KERN_INFO "VMXE Wasn't enabled?\n");

    printk(KERN_INFO "===================================\n");
}

MODULE_LICENSE("GPL");

MODULE_AUTHOR(AUTHOR);
MODULE_DESCRIPTION(DESC);

module_init(init_);
module_exit(exit_);

vmasm.S

.intel_syntax noprefix
.text

.global read_cr4
read_cr4:
    mov rax, cr4
    ret

.global write_cr4
write_cr4:
    mov cr4, rdi
    ret

Makefile

obj-m += testmod.o
testmod-objs := vmmod.o vmasm.o

all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Testing

$> sudo insmod testmod.ko && sudo rmmod testmod

Output

[  607.459248] ===================================
[  607.459256] Enabling VMXE
[  607.459302] VMXE Has Been Enabled
[  607.459311] CR4: 000426E0
[  607.459315] CR4: 000426E0
[  607.459318] CR4: 000426E0
[  607.459321] CR4: 000426F0
[  607.459334] CR4: 000426E0
[  607.459336] CR4: 000426E0
[  607.459338] CR4: 000426E0
[  607.459373] CR4: 000426E0
[  607.473007] ----------------------------------------
[  607.473025] CR4: 000406E0
[  607.473065] CR4: 000406E0
[  607.473068] CR4: 000406F0
[  607.473072] CR4: 000406E0
[  607.473074] CR4: 000406E0
[  607.473078] CR4: 000406E0
[  607.473080] CR4: 000406E0
[  607.473103] CR4: 000406E0
[  607.473121] VMXE Wasn't enabled?
[  607.473129] ===================================

The output clearly shows that Bit 13 (VMXE) of CR4 is enabled after the module load function, but during the module unload function, it's no longer set.

Is there a kernel module that would periodically reset VMXE? I have kvm.ko and kvm_intel.ko unloaded when running this code, and the Intel emulation BIOS settings have been enabled, and the CPU supports VMX.

As per (Modifying control register in kernel module), I tried adding on_each_cpu to set VMXE on each CPU core, but it didn't help.

Any Ideas?

Thanks!


Solution

  • The Linux kernel is not deliberately clearing CR4.VMXE. Rather, Linux caches the value of CR4 and uses the cache instead of reading the register, perhaps for performance reasons. Since you didn't change that cache, the next time the kernel tries to clear a bit in CR4, it will restore the VMXE bit from the cache, clearing it to zero. If your driver had established a VMXON region, you would instead have seen a kernel panic when the kernel inadvertently cleared CR4.VMXE with an active VMXON region.

    There isn't anything that I'm aware of that periodically resets CR4 bits. However, TLB shootdowns are somewhat common, and if any of the pages being invalidated are global, the only way to do that is to clear CR4.PGE. I don't know why global pages would be frequently invalidated, but I know a coworker of mine had to debug an issue that started around the 4.4.0 series kernels caused by CR4.PGE being cleared, so it definitely happens with some frequency.

    The proper way to enable CR4 feature bits is the same way the kernel itself does it e.g. in /arch/x86/kernel/cpu/common.c:

    static __always_inline void setup_smep(struct cpuinfo_x86 *c)
    {
        if (cpu_has(c, X86_FEATURE_SMEP))
            cr4_set_bits(X86_CR4_SMEP);
    }
    

    This ends up calling this function:

    void cr4_update_irqsoff(unsigned long set, unsigned long clear)
    {
        unsigned long newval, cr4 = this_cpu_read(cpu_tlbstate.cr4);
    
        lockdep_assert_irqs_disabled();
    
        newval = (cr4 & ~clear) | set;
        if (newval != cr4) {
            this_cpu_write(cpu_tlbstate.cr4, newval);
            __write_cr4(newval);
        }
    }
    

    Notice that it doesn't call __read_cr4() but rather this_cpu_read(cpu_tlbstate.cr4). This is the cache that must be updated if you want the kernel to stop disabling CR4.VMXE.