Search code examples
linuxmemoryprocessoperating-systemvirtual-memory

Why Linux has 4 layers of "page tables" and how it works exactly


I am trying to grok page layouts according to these notes, having just started to learn about operating systems, so it's a bit over my head atm. Specifically it says this:

In linux there are 4 'levels' of page tables. A table consists of an array of entries of type pXX_t, wrapping a pXXval_t:

  1. Page Global Directory (PGD) - pgd_t/pgdval_t.
  2. Page Upper Directory (PUD) - pud_t/pudval_t.
  3. Page Middle Directory (PMD) - pmd_t/pmdval_t.
  4. Page Table Entry directory (PTE) - pte_t/pteval_t.

These types are simply typedef'd wrappers around fundamental architecture-dependent types, gathering all the x86-64 types together...

Then they draw this:

    6         5         4         3         2         1
4321098765432109876543210987654321098765432109876543210987654321
0000000000000000000000000000000000000000101100010111000000010000
[   RESERVED   ][  PGD  ][  PUD  ][  PMD  ][  PTE  ][  OFFSET  ]
                         |        |        |        |-PAGE_SHIFT
                         |        |        |-----------PMD_SHIFT
                         |        |--------------------PUD_SHIFT
                         |---------------------------PGDIR_SHIFT

It's still over my head, particularly with all the abbreviations and such.

Could one explain in slightly gentler terms what exactly we have here? I would like to implement virtual memory pagination in a JS operating system, to simulate sort of a realistic operating system. I would like to know what structs are being used and how they actually look in the memory itself (or if done by assembly, whatever is easier to use to explain/teach).

So right at the moment I am trying to figure out how (in JavaScript), to use 1 array to manage all of the memory of multiple processes. No cheating and using other JavaScript arrays or objects or whatever else, only 1 array, and then integers pretty much. I am stuck at trying to figure out how I can create an array (a slice of memory) in memory, without using arrays haha. Then to make it more complicated, I need to make pages in this 1 array, which map to the sections of memory that are allowed per process. I am trying to figure out in detail how to do this, and these notes are the closest I've come to seeing it done realistically, but they are a bit too detailed atm. Wondering if one could help simplify the explanation just a little, and to help paint the picture of how to create the pages/process mappings in a single array. Explaining this Linux system would help to visualize this.


Solution

  • This is virtual memory.

    Specifically, it's the x86-64 HW page-table layout that Linux uses to map one process's virtual address space into physical memory, with page granularity, when running on that ISA. Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)? has a diagram.

    It's a radix tree that breaks addresses up into 9-bit chunks, with 12 page offset bits. (12 + 4*9 = 48 bit virtual addresses, required to be correctly sign-extended to 64-bit). 32-bit x86 page tables use 2 levels of 10 bits each (12 + 2*10 = 32-bit virtual addresses).

    On a TLB miss, hardware walks this table to reach a PTE (Page Table Entry), or an "invalid" entry in which case it raises a #PF exception.


    Judging by your other questions, you should put this on your TODO list for much later, after you understand the basics like how a stack works.

    Just simulate a process, not a whole system, so you only need to consider its private address space. Under a real OS, a process's address space is virtual and corresponds directly to one big JS array. So you have one JS array per process, holding its memory (and registers separately, or as part of that array).

    This is basically leaving virtual memory up to the JS implementation, because you're running on JS not on real hardware.

    Unless you're writing a full simulator for some actual hardware ISA, this should be more than fine.

    CPUs use dedicated hardware (a TLB) to cache translations. In JS you'd need to indirect every memory access through a mapping function which would make your source code pretty painful. You could use a JS version of a TLB by using a map/dictionary data structure instead of actual page tables.

    IDK if you could do anything with try{}catch to correspond to page faults.

    Maybe also consider TypedArray to get different "views" of the same underlying buffer? IDK how that could help emulate non-contiguous virtual memory with holes in it. I think in JS you're going to want your "processes" to keep their virtual memory usage pretty dense / contiguous, otherwise you could be wasting huge amounts of memory (and not detecting accesses to "unmapped" pages).