Search code examples
arm64virtual-memorymmuarmv8

Understanding AArch64 Translation Tables


I'm doing a hobby OS project and I an trying to get Virtual Memory set up. I had another project in an x86 architecture working with Page Tables but I am now learning ArmV8 now.

Now, I now that the maximum amount of bits used for addressing is 48[1]. The last 12 to 16 bits are used "as-is" to index within the selected region (depending on which granule size is selected[2]).

I just don't understand how we get those intermediate bits. Obviously the documentation is showing that intermediate tables are used[3] but it is quite unclear on how those tables are used.

In the first half of the following image, we see translation of an address with 4k granules and using 38 address bits.

Translation table graphic

I can't understand this image in the slightest. The "offsets", for example bits 38 to 30 point to an entry in the L1 table. How and where is this table defined ?

What I think is happening is, this a 12+8+8+8 address translation scheme. Starting from the right, 12 bits to find an offset within a 4096 block of memory. Right of that is 8 bits for L3, meaning that L3 indexes 256 blocks of 4096 bytes (1MB). Right of this, L2, has 8 bits also so 256 entries of (256*4096), totalling 256MB per L2 entry. Right of L2 is L1 with also 8 bits, 256 entries of 256MB means the total addressable memory is 64GB of physical RAM.

I don't think this is correct because that would only allow a 1:1 mapping of memory. Each table descriptor needs to carry some access flags and what not. Thus going back to the question of: how are those table defined. Each offset section is 8 bits and that's not enough to contain the address of a translation table.

Anyway, I am completely lost. I would appreciate if someone could give me a "plain english" explanation of how a translation table walk is done ? A graph would be nice but probably too much effort, I'll make one and share if after to help me synthesize the information. Or at least, if someone has one, a link to a good video/guide where the information isn't totally obfuscated ?


Here is the list of materials I have consulted:

https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translating-a-Virtual-Address-to-a-Physical-Address https://forums.raspberrypi.com/viewtopic.php?t=227139

https://armv8-ref.codingbelief.com/en/chapter_d4/d42_4_translation_tables_and_the_translation_proces.html

https://github.com/bztsrc/raspi3-tutorial/blob/master/10_virtualmemory/mmu.c


[1]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translation-tables-in-ARMv8-A

[2]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translation-tables-in-ARMv8-A/Effect-of-granule-sizes-on-translation-tables

[3]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translating-a-Virtual-Address-to-a-Physical-Address


Solution

  • The entire model behind translation tables arises from three values: the size of a translation table entry (TTE), the hardware page size (aka "translation granule"), and the amount of bits used for virtual addressing.

    On arm64, TTEs are always 8 bytes. The hardware page size can be one of 4KiB, 16KiB or 64KiB (0x1000, 0x4000 or 0x10000 bytes), depending on both hardware support and runtime configuration. The amount of bits used for virtual addressing similarly depends on hardware support and runtime configuration, but with a lot more complex constraints.

    By example

    For the sake of simplicity, let's consider address translation under TTBR0_EL1 with no block mappings, no virtualization going on, no pointer authentication, no memory tagging, no "large physical address" support and the "top byte ignore" feature being inactive. And let's pick a hardware page size of 0x1000 bytes and 39-bit virtual addressing.

    From here, I find it easiest to start at the result and go backwards in order to understand why we arrived here. So suppose you have a virtual address of 0x123456000 and the hardware maps that to physical address 0x800040000 for you. Because the page size is 0x1000 bytes, that means that for 0 <= n <= 0xfff, all accesses to virtual address 0x123456000+n will go to physical address 0x800040000+n. And because 0x1000 = 2^12, that means the lowest 12 bits of your virtual address are not used for address translation, but indexing into the resulting page. Though the ARMv8 manual does not use this term, they are commonly called the "page offset".

    63                                                         12 11           0
    +------------------------------------------------------------+-------------+
    |                         upper bits                         | page offset |
    +------------------------------------------------------------+-------------+
    

    Now the obvious question is: how did we get 0x800040000? And the obvious answer is: we got it from a translation table. A "level 3" translation table, specifically. Let's defer how we found that for just a moment and suppose we know it's at 0x800037000. One thing of note is that translation tables adhere to the hardware page size as well, so we have 0x1000 bytes of translation information there. And because we know that one TTE is 8 bytes, that gives us 0x1000/8 = 0x200, or 512 entries in that table. 512 = 2^9, so we'll need 9 bits from our virtual address to index into this table. Since we already use the lower 12 bits as page offset, we take bits 20:12 here, which for our chosen address yield the value 0x56 ((0x123456000 >> 12) & 0x1ff). Multiply by the TTE size, add to the translation table address, and we know that the TTE that gave us 0x800040000 is written at address 0x8000372b0.

    63                                              21 20      12 11           0
    +------------------------------------------------------------+-------------+
    |                    upper bits                   | L3 index | page offset |
    +------------------------------------------------------------+-------------+
    

    Now you repeat the same process over for how you got 0x800037000, which this time came from a TTE in a level 2 translation table. You again take 9 bits off your virtual address to index into that table, this time with an value of 0x11a ((0x123456000 >> 21) & 0x1ff).

    63                                   30 29      21 20      12 11           0
    +------------------------------------------------------------+-------------+
    |              upper bits              | L2 index | L3 index | page offset |
    +------------------------------------------------------------+-------------+
    

    And once more for a level 1 translation table:

    63                        40 39      30 29      21 20      12 11           0
    +------------------------------------------------------------+-------------+
    |        upper bits         | L1 index | L2 index | L3 index | page offset |
    +------------------------------------------------------------+-------------+
    

    At this point, you used all 39 bits of your virtual address, so you're done. If you had 40-bit addressing, then there'd be another L0 table to go through. If you had 38-bit addressing, then we would've taken the L1 table all the same, but it would only span 0x800 bytes instead of 0x1000.
    But where did the L1 translation table come from? Well, from TTBR0_EL1. Its physical address is just in there, serving as the root for address translation.

    Now, to perform the actual translation, you have to do this whole process in reverse. You start with a translation table from TTBR0_EL1, but you don't know ad-hoc whether it's L0, L1, etc. To figure that out, you have to look at the translation granule and the number of bits used for virtual addressing. With 4KiB pages there's a 12-bit page offset and 9 bits for each level of translation tables, so with 39 bits you're looking at an L1 table. Then you take bits 39:30 of the virtual address to index into it, giving you the address of the L2 table. Rinse and repeat with bits 29:21 for L2 and 20:12 for L3, and you've arrived at the physical address of the target page.