Search code examples
clinuxkernelpanic

Why BUG_ON(!in_nmi()) was triggerd?


I got a kernel BUG, I don't know why it was triggered.

[ 242.337362] kernel BUG at arch/x86/kernel/cpu/mce/core.c:1364!
[ 242.337366] invalid opcode: 0000 [#1] SMP NOPTI

This is CentOS 8.5, Kernel 4.18.0-348.el8.x86_64 on an x86_64.

The core.c line 1364 is:

    nmi_exit();

(above line is inside do_machine_check() ):

By checking nmi_exit()

https://elixir.bootlin.com/linux/v4.18/source/include/linux/hardirq.h#L78

#define nmi_exit()                      \
    do {                            \
        trace_hardirq_exit();               \
        rcu_nmi_exit();                 \
        BUG_ON(!in_nmi());              \
        preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
        ftrace_nmi_exit();              \
        lockdep_on();                   \
        printk_nmi_exit();              \
    } while (0)

It looks like I hit this BUG_ON(!in_nmi());, but I checked do_machine_check(), it should still in_nmi (since line 1255 nmi_enter();), why BUG_ON(!in_nmi()); was triggerd?

Others:

  1. Here is the CentOS 4.18 kernel source code download:

https://vault.centos.org/8.5.2111/BaseOS/Source/SPackages/kernel-4.18.0-348.el8.src.rpm


Solution

  • I got a kernel BUG, I don't know why it was triggered?

    The specific bug you experience attempted to use an invalid opcode: 0000 [#1] SMP NOPTI.

    I'll address that, its cause, and how to resolve the issue. First, I'll define some terminology.

    What is non-maskable interrupt (NMI)?

    A NMI is a hardware interrupt that is exempt from any interrupt-masking enabled by the operating system (e.g. CentOS 8.5). In nearly every situation, it is in response to non-recoverable hardware errors.

    What are some typical uses of a NMI? 1

    • Low level debugging such as the early Apple Macintosh's "programmers' button".
    • ECC Memory parity errors beyond what can be corrected may halt the system.
    • Impending doom such as sudden loss of power to quiescent the system.
    • Enabling hardware watchdog timers on a regular schedule to panic if one is missed.

    Could a second NMI arrives while processing the first?

    Linux has had Intel nested NMI support for as long as I remember. A vulnerability in the Intel nested NMI support was exciting in 2012. Intel has a NMI iret flaw that requires the NMI handler to avoid triggering a page fault or break-point while processing an NMI.

    Support for nest NMI in ARM64 and PowerPC was committed to Linux on May 20th, 2020.

    Why does BUG_ON() do exactly?

    Starting in Linux 2.6 BUG_ON() is debugging macro for when something goes terribly wrong. If the value passed to the macro is true, the Linux kernel will trigger the invalid instruction. This results in the CPU throwing an invalid opcode exception. Normally if this happens in a process, the process dies. If this happens during an NMI, it's far more serious.

    BUG_ON(!in_nmi()) translates to BUG_ON(true)?

    So in_nmi() is a check if the current preempt bit of the current NMI is set to true.

    What is a NOPTI? 2

    Linux uses this to disables Meltdown (Kernel Page Table Isolation) mitigations. Typically nopti is added to the kernel boot options to disable.

    Remedies?

    What can I do about it if it's the software? Most Likely

    • Try booting the system with Meltdown enabled, if currently disabled. Or booting with it disabled, if enabled.
    • Upgrade Linux to 5.4.100 or higher, if possible. One example bug reported by Intel 2 years before the release of Intel Icelake.
    • Migrate to AlmaLinux or Rocky Linux, CentOS 8.5's spiritual successors.

    Well how about if it's the a hardware defect? Less Likely

    • Look to replacing one at a time (in compatible configurations with the system logic board) until you find the defective one.
    • Replace the system logic board and/or other PCIe devices.

    TL;DR

    A working theory for this issue is the NMI are being cause by an uncorrected hardware memory error and a coding error creates the circumstances that BUG_ON(!in_nmi()) is checked before the second increased had incremented the preempt_counter.

    In this particular case, the original poster used a tool einj_mem_uc to general a simulated memory error. That initiates the NMI.