Search code examples
crashkernel

What does happens in case of a kernel panic while handling a kernel panic?


I have a pretty stupid question:

What does happen in case a kernel panic happens while handling a kernel panic?

Does the computer just crash? Where? In the "hardware"? in the BIOS? Does it just randomly executes whatever results in executing and potentially modifies stuff it isn't supposed to? What if it happens again?

What does it look like? Is the display able to show something or does it loose it's contents, because some "buffer" doesn't get updated?


Solution

  • I'd love to paraphrase Fermat here and say that I've found a wonderful compendium of knowledge for you but the margin of a SO answer is too narrow to contain it all ...

    This is a complex question. First, the very premise, "kernel panic", is imprecise. The meaning of the term depends on the operating system, but even within the same OS kernel, there are different "kinds" of (fatal) aborts. To illustrate, most OS kernels allow kernel/driver code to explicitly call a function panic(...). But there are also implied fault handlers that would "panic", for example a pagefault that occurs due to a NULL pointer dereference while executing kernel code is likely irrecoverable, and therefore the pagefault handler would abort ("panic") when that condition is identified. In addition, certain hardware knows "unhandleable" failure conditions (for which an operating system cannot register a codeblock to be called on occurance), such as x86's triple faults; such faults may just reset the CPU, even through mechanisms that bypass the OS entirely.

    Second, what would, or should, happen "during" a kernel panic ? Without prejudice, let's assume a "simple" case - have the kernel log a hopefully informative diagnostic message, and then reset/reboot the system. That would mean quite a few steps have to be performed by the panic() code:

    • to "quiesce" the system. Given we likely have other CPUs / threads running, that interrupts may yet occur, that event timers are still live, the code will have to take measures to stop all other execution, silence all interrupt sources, park every thread (see, for example, How are threads terminated during a linux crash?), timer, registered trigger, ... to "ready" the system for a "stable" diagnostic state snapshot
    • to "record" the panic state to the log; this might mean to run codethat extracts and collects information like CPU register records, stacktraces, CPU and HW state(s), and various other kernel or driver metadata.
    • to "output" that log. This is a form of I/O and may require to "reactivate" and execute kernel functionality that was shutdown to take the state snapshot, like drivers, so that the log can be written to the desired destination
    • to "reset" the system. This again is likely to mean calling driver or firmware code to perform the action (or on x86, trigger a "triple fault" but this is a CPU/hardware-specific thing), and is extremely operating-system and hardware-specific.

    All of these execute code, which can have bugs. Also, the "state freeze" might error or complete only partially, and then code not designed to run concurrently with each other (the panic code and, say, unrelated driver interrupts) could happen at the same time, interfering with each other. And, depending on implementation, some of these might run "asynchronously" to panic(), i.e. not in the same thread context, or serially, for example after the "panic report" has finished. So additional bugs or diagnostic messages may intersperse the first, but also potentially follow it (as "secondaries", if you like to call it that).

    And as mentioned, this is a "simple" case; an OS may want to take other measures intended to allow recovery - such as "fence" a panic to the driver that triggered it, and remove that / prevent that from loading, in order to bring the operating system "up" on a retry. How successful such strategies may be I cannot "speculatively quantify". Linux, for example, has a few kernel tunables related to "when to panic", and developers / kernel programmers may use them differently depending what issues they encounter.

    Can such errors "recurse" or "loop" ? You might get an unhandled pagefault while the pagefault error code tries to print a stacktrace for a previous NULL ptr dereference, and then another series of pagefaults because the stack got corrupted, and then another pagefault for a non-mapped address because the recursion eventually overflows the kernel stack. At which point the handlers will hopefully switch to a separate stack (on x86, "double fault"), and if you're lucky, print you a message saying "kernel stack overflow on thread ..." or some such. Or not ... and also, even if the panic sequence succeeds and the system resets, on next reboot, the same issue may reoccur, the panic may hit again, print-diags, reset, rinse-repeat ... again not something unseen at all.

    Can such errors "hang" ? Yes, they can; code that called panic() might hold resources such as locks that would be required by (later) parts of the panic handling, and if no specific measures are taken for panic code to "blast through" locks, a deadlock may occur. There are measures (to "break locks" when in-panic, to allow for recursive locking, or to retain a running watchdog timer to detect this and break out of it) that kernel code can take to mitigate this, but again it's operating system (and firmware) specific what is done in such cases. Linux, for example, knows a dozen different ways to "reboot" an x86 system, see reboot= kernel parameter

    It's not unheard of that a "kernel panic" triggers a long long series of kernel diagnostic messages for "secondary" panics that occurred while recording the first diagnostics, or while attempting to reboot the system. You'll get a huge amount of log output; even, for example, if shutting down secondary (non-panicing) CPU cores failed, panic messages from threads running on multiple different CPU cores may interleave.

    In an ideal world, "at some point" a system reset would break through, the machine would reboot, and hopefully admins and/or developers can make sense from the diagnostics. In a real world though, it's not unheard of that what alerts sysadmins to a machine "stuck in panicing" is a monitoring system event saying "I've not had metrics from this box for 20 minutes", or "> 1GB kernel console logs written from this machine in the last 10 minutes", that will then make someone, person or external monitoring agent, take action.

    There is a bit of "art" in troubleshooting such issues - to look for how this started, where "the mess" began and how, instead of "just" blaming the hardware and de-racking the affected system. Whether that is necessary or useful in your environment I again cannot speculate on.

    If you want to play with this yourself, the Linux kernel has several mechanisms that allow "injecting" panics, for example provoke crashes, or the KASAN test facilities. Build a kernel with the corresponding facilities, boot it in QEMU, "and see" (if you can combine them in some ways to trigger multiple and/or recursive/nested panics). Have fun!

    Or, in short, "it can be complicated" :-)