Search code examples
linuxprocesslinux-kernelforkmemory-address

A weird question about fork() and physical address in Linux


A professor who teaches Linux sent his students this weird question...

The weird thing is that, this program will give an output where the parent process and the child process will get the same physical address when run in a common user mode, but when I run this on a root user mode, the output shows that parent and child have different physical addresses, just like below.

When in a common user mode:

pid:5269, ppid:3152
pid:5270, ppid:5269
Child process : �
virtual addr of str=0x7ffd7023bfd0 and &count=0x7ffd7023bfcc, physical addr of str=0xfd0,&count=0xfcc
Father process : �
count: 1 (0x7ffd7023bfcc), pid: 5269
virtual addr of str=0x7ffd7023bfd0 and count=0x7ffd7023bfcc, physical addr of str=0xfd0,&count=0xfcc
count: 2 (0x7ffd7023bfcc), pid: 5270

When in a root user mode:

pid:5294, ppid:3414
pid:5295, ppid:5294
Child process : �
virtual addr of str=0x7ffe501a1530 and &count=0x7ffe501a152c, physical addr of str=0x1298db530,&count=0x1298db52c
Father process : �
count: 1 (0x7ffe501a152c), pid: 5294
virtual addr of str=0x7ffe501a1530 and count=0x7ffe501a152c, physical addr of str=0x12282b530,&count=0x12282b52c
count: 2 (0x7ffe501a152c), pid: 5295

I just can't figure this out in, can anyone help?

// file name proc-1.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdint.h>
#include <sys/stat.h>
//#include <linux/capability.h>
//#include <linux/sched.h>
intptr_t mem_addr(unsigned long vaddr, unsigned long *paddr)
{
    int pagesize = getpagesize();
    unsigned long v_pageindex = vaddr / pagesize;
    unsigned long v_offset = v_pageindex * sizeof(uint64_t);
    unsigned long page_offset = vaddr % pagesize;
    uint64_t item = 0;
    int fd = open("/proc/self/pagemap", O_RDONLY);
    lseek(fd, v_offset, SEEK_SET);
    read(fd, &item, sizeof(uint64_t));
    if((((uint64_t)1 << 63) & item) == 0)
    {
        printf("page present is 0\n");
        return 0 ;
    }
    uint64_t phy_pageindex = (((uint64_t)1 << 55)- 1) & item;
    *paddr = (phy_pageindex * pagesize) + page_offset;
    return *paddr;
}

int main(void)
{
    char str[10];
    int count = 1;
    unsigned long pa[2]={0,0};
    int fd = open("test.txt", O_RDWR);
    if(fork() == 0)
    {
        printf("pid:%d, ppid:%d\n",getpid(), getppid());
        read(fd, str, 10);
        count += 5;
        printf("Child process : %s\n", (char *)str);
        mem_addr((unsigned long)str, &pa[0]);
        mem_addr((unsigned long)&count, &pa[1]);
        printf("virtual addr of str=%p and count=%p, physical addr of str=%p,&count=%p\n",str,&count, pa[0], pa[1]);
        printf("count: %d (%p), pid: %d\n", count, &count, getpid());
    }
    else
    {
        printf("pid:%d, ppid:%d\n",getpid(), getppid());
        read(fd, str, 10);
        //count ++;
        printf("virtual addr of str=%p and &count=%p, physical addr of str=%p,&count=%p\n",str,&count, mem_addr((intptr_t)str, &pa[0]), mem_addr((intptr_t)&count,&pa[1]));
        printf("Father process : %s\n", (char *)str);
        printf("count: %d (%p), pid: %d\n", count, &count, getpid());
}
sleep(10);
return 0;
}

I am sure this weird problem should be related to the file /proc/pid/pagemap, but I just can't solve it.


Solution

  • You are just reading a zeroed out page frame number from /proc/self/pagemap when you are not root. You need CAP_SYS_ADMIN to get the correct PFNs. In fact, the "physical addresses" you get are suspiciously 0xfd0 and 0xfcc, way too low. They are just the result of 0 + page_offset.

    Kernel documentation confirms the above:

    Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs. In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN. Reason: information about PFNs helps in exploiting Rowhammer vulnerability.

    The variables you are inspecting are on the stack, and the stack of the parent and the child cannot possibly share the same physical address. If you think about it, things would pretty easily break otherwise. As soon as the stack is touched by the child (i.e. as soon as fork() does anything with its local variables or returns), copy-on-write will happen and the child will get a new physical page different from the one of the parent.

    To correctly observe this even with a process that is not root, you will have to read /proc/[pid]/pagemap from another process that is root. Start the first one, make it print the PIDs and pause waiting for input, then inspect /proc/[pid]/pagemap with another process running as root. You will see that the two physical addresses are different.