Search code examples
cachingoperating-systempage-tables

Page Table and Cache Hit Rates


I made a post about page table and the amount of registers needed for a multi level page table and fount out that every page table, regardless of the level, only needs one register to access the top of the page table. But my second question has not been answered.

How will cache (L1-L3) in the processor affect memory reference access to page table? Will the majorities miss or hit? Why does it happen? I am told that this topic may have different answers based on the architectures used, so maybe general answers would be fine.

I tried to find references for this, but I cannot find it. Might say that I am really beginner in OS.

The link to my previous question: Page Table Registers and Cache

Edit: Because of TLB, the access of memory reference to Page Table can be reduced, causing it to get more hits. Is it correct? Help please :D


Solution

  • The basic idea (without any caches of any kind) is that when you access memory the CPU:

    • finds the highest level page table (e.g. from the virtual address and a control register) and fetches the highest level page table entry from RAM

    • finds the next level page table (e.g. from the virtual address and highest level page table entry) and fetches the next level page table entry from RAM; and so on (repeated for each level of page tables) until the CPU reaches the lowest level page table entry.

    • finds the physical address (e.g. from the virtual address and lowest level page table entry), and fetches the data from that physical address

    This is obviously slow. To speed it up there are multiple "cache like things":

    a) The caches themselves. E.g. rather than fetching anything from RAM the CPU may fetch from cache instead (including when CPU fetches page table entries). Note that typically there's multiple levels of caches (e.g. L1 data cache, L2 unified cache, ...) and this may apply to some caches and not others (e.g. CPU won't fetch page table entries from "L1 instruction cache" but probably will fetch them from "L3 unified cache").

    b) The TLBs (Translation Look-aside Buffers); which mostly cache the lowest level page table entry. This allows almost all of the work to be skipped (if there's a "TLB hit").

    c) Higher level translation caches. Modern CPUs have additional caches that cache an intermediate level of the page table heirarchy (e.g. maybe the 3rd level page table entry if there's 4 or more levels, and not the highest or lowest level entry). These reduce the cost of "TLB miss" (if there's a "higher level translation hit") by allowing some of the work to be skipped.