I am testing VMX in Linux (Ubuntu-16.04) in a VMware platform.
When the VM is running a long loop, the host Linux hit CPU softlock, as follows,
Jun 11 00:01:13 ubuntu kernel: [ 5624.196130] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [guest:8297]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196134] Modules linked in: vmm(OE) rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) bnep vmw_vsock_vmci_transport vsock crct10dif_pclmul vmw_balloon crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev btusb btrtl btbcm btintel snd_ens1371 input_leds serio_raw bluetooth snd_ac97_codec gameport ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore i2c_piix4 shpchp vmw_vmci 8250_fintek mac_hid kvm_intel kvm irqbypass parport_pc ppdev lp parport autofs4 hid_generic usbhid hid vmwgfx psmouse ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi mptscsih drm ahci mptbase libahci e1000 scsi_transport_spi pata_acpi fjes [last unloaded: vmm]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196210] CPU: 0 PID: 8297 Comm: guest Tainted: G OEL 4.4.0-127-generic #153-Ubuntu
Jun 11 00:01:13 ubuntu kernel: [ 5624.196212] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
Jun 11 00:01:13 ubuntu kernel: [ 5624.196213] task: ffff880092788000 ti: ffff88007ffb8000 task.ti: ffff88007ffb8000
Jun 11 00:01:13 ubuntu kernel: [ 5624.196215] RIP: 0010:[<ffffffffc0648c7f>] [<ffffffffc0648c7f>] failInvalid+0x18/0xa9 [vmm]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196221] RSP: 0018:ffff88007ffbbe60 EFLAGS: 00000246
Jun 11 00:01:13 ubuntu kernel: [ 5624.196223] RAX: 0000000000000000 RBX: 00000000006041c0 RCX: 0000000000000000
Jun 11 00:01:13 ubuntu kernel: [ 5624.196224] RDX: 0000000000001dfa RSI: 0000000000000000 RDI: 000000000000640a
Jun 11 00:01:13 ubuntu kernel: [ 5624.196225] RBP: ffff88007ffbbe70 R08: 000007ff00000017 R09: 0000000000000010
Jun 11 00:01:13 ubuntu kernel: [ 5624.196226] R10: c093ffffffff0000 R11: 0000000000100000 R12: 00000000006041c0
Jun 11 00:01:13 ubuntu kernel: [ 5624.196227] R13: ffff8800b37d2600 R14: 0000000040087401 R15: 00000000006041c0
Jun 11 00:01:13 ubuntu kernel: [ 5624.196229] FS: 00007f8793426700(0000) GS:ffff8800ba600000(0000) knlGS:0000000000000000
Jun 11 00:01:13 ubuntu kernel: [ 5624.196230] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 11 00:01:13 ubuntu kernel: [ 5624.196231] CR2: 0000000000910000 CR3: 00000000671f2000 CR4: 0000000000362670
Jun 11 00:01:13 ubuntu kernel: [ 5624.196237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 11 00:01:13 ubuntu kernel: [ 5624.196238] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 11 00:01:13 ubuntu kernel: [ 5624.196239] Stack:
Jun 11 00:01:13 ubuntu kernel: [ 5624.196240] 00000000006041c0 00000000006041c0 ffff88007ffbbe88 ffffffffc064a538
Jun 11 00:01:13 ubuntu kernel: [ 5624.196242] ffff8800907ab398 ffff88007ffbbe98 ffffffffc064a83c ffff88007ffbbf08
Jun 11 00:01:13 ubuntu kernel: [ 5624.196244] ffffffff81227f8f 0000000000000002 ffff8800b399e610 ffff88008cf50b10
Jun 11 00:01:13 ubuntu kernel: [ 5624.196246] Call Trace:
Jun 11 00:01:13 ubuntu kernel: [ 5624.196251] [<ffffffffc064a538>] vmm_vcpu_run+0x58/0x310 [vmm]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196254] [<ffffffffc064a83c>] my_ioctl+0x4c/0x50 [vmm]
Jun 11 00:01:13 ubuntu kernel: [ 5624.196258] [<ffffffff81227f8f>] do_vfs_ioctl+0x2af/0x4b0
Jun 11 00:01:13 ubuntu kernel: [ 5624.196260] [<ffffffff81214329>] ? vfs_write+0x149/0x1a0
Jun 11 00:01:13 ubuntu kernel: [ 5624.196262] [<ffffffff81228209>] SyS_ioctl+0x79/0x90
Jun 11 00:01:13 ubuntu kernel: [ 5624.196265] [<ffffffff81850c88>] entry_SYSCALL_64_fastpath+0x1c/0xbb
Jun 11 00:01:13 ubuntu kernel: [ 5624.196267] Code: 00 48 8d 1c 25 74 19 65 c0 0f 78 03 8b 04 25 74 19 65 c0 41 5f 41 5e 41 5d 41 5c 41 5b 41 5a 41 59 41 58 5f 5e 5d 5a 59 5b 58 9d <48> c7 c3 68 01 65 c0 eb 19 4c 8b 23 e8 c0 c4 ff ff 41 89 04 24
VMM enables VMEXIT on Interrupt, and there is timer in VMM to check VM state.
If I called touch_softlockup_watchdog()
in the timer function, there is NO such lockup happened.
But I think there is still NO chance for other threads/processes to scheduled on the pCPU where the VM is running.
So the question is, how can the VMM make host Linux scheduler schedule other entities on the pCPU, instead of letting the VM to take over the whole pCPU, calling schedule() or something else?
The VM code is loaded and started by a host application, which loads the VM image into a memory region allocated/setup by VMM. Then sets the vCPU context (like, GPRs, RIP, segment registers, etc), and asks the VMM to start the VM by calling 'VMLaunch' and 'VMResume'.
The easiest way to avoid this is to call schedule() in the handling loop of vmexit in kernel space.