Search code examples
cachingx86-64cpu-architecturecpu-cachevirtual-address-space

Performance implications of aliasing in VIPT cache


What are the performance implications of virtual address synonym (aliasing) in a VIPT cache? I'm specifically interested in recent x86_64 architectures but knowing more about others wouldn't hurt.

Specifically, if I mmap the same file twice will it cause more L1 misses or does Intel / AMD use coloring or something else to prevent that? Is there a way I could mmap to fixed addresses to help?


Solution

  • Intel makes their L1 caches small enough and associative enough that homonym and synonym aliasing are impossible: VIPT works the same as PIPT but faster. (All the index bits come from the offset-within-page low 12 bits of the address, so it's fully immune to any problems. Page-number bits only go into tags in L1i/L1d.)

    This is true for at least P6-family and Sandybridge-family. The Gracemont cores in Alder Lake have 64K 8-way L1i caches. (https://chipsandcheese.com/2021/12/21/gracemont-revenge-of-the-atom-cores/). The L1d caches are 32K 8-way, same as older P6-family, so no homonym or synonym problems with x86's 4K page size. I-caches are read-only so it's probably not a correctness problems to have the same physical address cached in two different sets.

    AMD has done it differently with big 64K low-associativity caches in some designs. They avoid correctness problem in hardware, not requiring the OS to do page-colouring (Virtually indexed physically tagged cache Synonym) for correctness.

    But there have been L1i performance issues with multiple processes mapping the same shared library to different randomized addresses in different processes. https://www.phoronix.com/review/amd_bulldozer_aliasing briefly describes how Linux worked around that problem by clearing bits [14:12] of the ELF base virtual address. (Not necessarily corresponding to the physical address so this is different from page coloring, but similar.)

    Zen 1 also used larger L1i (64K 4-way), but 32K 8-way L1d. See also How is AMD's micro-tagged L1 data cache accessed?

    There are apparently various techniques for building larger or less-associative L1d caches that aren't VIPT = PIPT for free, like micro-tags I think can help.


    Is there a way I could mmap to fixed addresses to help?

    For data caches, no, not necessary unless there are other CPUs with with bigger but not associative enough L1d caches.

    For executable mappings, aligning to a multiple of 8 pages (clearing bits [14:12]) could help for I-caches. Or 4 pages (bits [13:12]) for Zen 1's 64K 4-way, up from Bulldozer's 2-way.

    Zen 2 through 4 (https://en.wikichip.org/wiki/amd/microarchitectures/zen_3#Memory_Hierarchy) have L1d/i caches that are both 32K 8-way so no problem there.