Coming from the Windows world, I assume that Vmlinuz
is equivalent to ntoskrnl.exe
, and this is the kernel executable that gets mapped in Kernel memory.
Now I want to figure out whether an address inside kernel belongs to the kernel executable or not. Is using core_kernel_text
the correct way of finding this out?
Because core_kernel_text
doesn't return true for some of the addresses that clearly should belong to Linux kernel executable.
For example the core_kernel_text
doesn't return true when i give it the syscall entry handler address which can be found with the following code:
unsigned long system_call_entry;
rdmsrl(MSR_LSTAR, system_call_entry);
return (void *)system_call_entry;
And when I use this code snippet, the address of the syscall entry handler doesn't belong to the core kernel text or to any kernel module (using get_module_from_addr
).
So how can an address for a handler that clearly belongs to Linux kernel executable such as syscall entry, don't belong to neither the core kernel or any kernel module? Then what does it belong to? Which API do I need to use for these type of addresses that clearly belong to Linux kernel executable to assure me that the address indeed belongs to kernel?
I need such an API because I need to write a detection for malicious kernel modules that patch such handlers, and for now I need to make sure the address belongs to kernel, and not some third party kernel module or random kernel address. (Please do not discuss methods that can be used to bypass my detection, obviously it can be bypassed but that's another story)
The target kernel version is 4.15.0-112-generic
, and is Ubuntu 16.04 as a VMware guest.
Reproducible code as requested:
typedef int(*core_kernel_text_t)(unsigned long addr);
core_kernel_text_t core_kernel_text_;
core_kernel_text_ = (core_kernel_text_t)kallsyms_lookup_name("core_kernel_text");
unsigned long system_call_entry;
rdmsrl(MSR_LSTAR, system_call_entry);
int isInsideCoreKernel = core_kernel_text_((unsigned long)system_call_entry);
printk("%d , 0x%pK ", isInsideCoreKernel, system_call_entry);
EDIT1: So in the MSR_LSTAR
example that I gave above, it turns out that It's related to Kernel Page Table Isolation and CONFIG_RETPOLINE=y
in config:
system_call value is different each time when I use rdmsrl(MSR_LSTAR, system_call)
And that's why I am getting the address 0xfffffe0000006000 aka SYSCALL64_entry_trampoline
, the same as the question above.
So now the question remains, why this SYSCALL64_entry_trampoline
address doesn't belong to anything? It doesn't belong to any kernel module, and it doesn't belong to the core kernel, so which executable this address belongs to and how can I check that with an API similar to core_kernel_text
? It seems like it belongs to cpu_entry_area
, but what is that and how can I check if an address belongs to that?
You are seeing this "weird" address in MSR_LSTAR
(IA32_LSTAR) because of Kernel Page-Table Isolation (KPTI), which mitigates Meltdown. As other existing answers(1) you already found point out, the address you see is the one of a small trampoline (entry_SYSCALL_64_trampoline
) that is dynamically remapped at boot time by the kernel for each CPU, and thus does not have an address within the kernel text.
(1)By the way, the answer linked above wrongly states that the corresponding config option for KPTI is CONFIG_RETPOLINE=y
. This is wrong, the "retpoline" is a mitigation for Spectre, not Meltdown. The config to enable KPTI is CONFIG_PAGE_TABLE_ISOLATION=y
.
You don't have many options. Either:
You can implement support for this by detecting whether the kernel supports KPTI (CONFIG_PAGE_TABLE_ISOLATION
), and if so check whether current CPU has KPTI enabled. The code at kernel/cpu/bugs.c
that provides information for /sys/devices/system/cpu/vulnerabilities/meltdown
shows how this can be detected:
ssize_t cpu_show_meltdown(struct device *dev,
struct device_attribute *attr, char *buf)
{
if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
return sprintf(buf, "Not affected\n");
if (boot_cpu_has(X86_FEATURE_PTI))
return sprintf(buf, "Mitigation: PTI\n");
return sprintf(buf, "Vulnerable\n");
}
The actual trampoline is set up at boot and its address is stored in each CPU's "entry area" for later use (e.g. here when setting up IA32_LSTAR). This answer on Unix & Linux SE explains the purpose of the cpu entry area and its relation to KPTI.
In your module you can do the following detection:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kallsyms.h>
#include <asm/msr-index.h>
#include <asm/msr.h>
#include <asm/cpufeature.h>
#include <asm/cpu_entry_area.h>
// ...
typedef int(*core_kernel_text_t)(unsigned long addr);
core_kernel_text_t core_kernel_text_;
bool syscall_entry_64_ok(void)
{
unsigned long entry;
rdmsrl(MSR_LSTAR, entry);
if (core_kernel_text_(entry))
return true;
#ifdef CONFIG_PAGE_TABLE_ISOLATION
if (this_cpu_has(X86_FEATURE_PTI)) {
int cpu = smp_processor_id();
unsigned long trampoline = (unsigned long)get_cpu_entry_area(cpu)->entry_trampoline;
if ((entry & PAGE_MASK) == trampoline)
return true;
}
#endif
return false;
}
static int __init modinit(void)
{
core_kernel_text_ = (core_kernel_text_t)kallsyms_lookup_name("core_kernel_text");
if (!core_kernel_text_)
return -EOPNOTSUPP;
pr_info("syscall_entry_64_ok() -> %d\n", syscall_entry_64_ok());
return 0;
}