Search code examples
performancex86cpu-architecturecpu-cachemicro-optimization

Cycles/cost for L1 Cache hit vs. Register on x86?


I remember assuming that an L1 cache hit is 1 cycle (i.e. identical to register access time) in my architecture class, but is that actually true on modern x86 processors?

How many cycles does an L1 cache hit take? How does it compare to register access?


Solution

  • Here's a great article on the subject:

    http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/1

    To answer your question - yes, a cache hit has approximately the same cost as a register access. And of course a cache miss is quite costly ;)

    PS:

    The specifics will vary, but this link has some good ballpark figures:

    Approximate cost to access various caches and main memory?

    Core i7 Xeon 5500 Series Data Source Latency (approximate)
    L1 CACHE hit, ~4 cycles
    L2 CACHE hit, ~10 cycles
    L3 CACHE hit, line unshared ~40 cycles
    L3 CACHE hit, shared line in another core ~65 cycles
    L3 CACHE hit, modified in another core ~75 cycles remote
    L3 CACHE ~100-300 cycles
    Local DRAM ~30 ns (~120 cycles)
    Remote DRAM ~100 ns 
    

    PPS:

    These figures represent much older, slower CPUs, but the ratios basically hold:

    http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/2

    Level                    Access Time  Typical Size  Technology    Managed By
    -----                    -----------  ------------  ---------     -----------
    Registers                1-3 ns       ?1 KB          Custom CMOS  Compiler
    Level 1 Cache (on-chip)  2-8 ns       8 KB-128 KB    SRAM         Hardware
    Level 2 Cache (off-chip) 5-12 ns      0.5 MB - 8 MB  SRAM         Hardware
    Main Memory              10-60 ns     64 MB - 1 GB   DRAM         Operating System
    Hard Disk                3M - 10M ns  20 - 100 GB    Magnetic     Operating System/User