The Intel manual optimization (revision September 2019) shows a 48 KiB 8-way associative L1 data cache for the Ice Lake microarchitecture.
1 Software-visible latency/bandwidth will vary depending on access patterns and other factors.
This baffled me because:
All in all, it seems that the cache is more expensive to handle but the latency increased only slightly (if it did at all, depending on what Intel means exactly with that number).
With a bit of creativity, I can still imagine a fast way to index 96 sets but point two seems an important breaking change to me.
What am I missing?
The optimization manual is wrong.
According to the CPUID
instruction, the associativity is 12 (on a Core i5-1035G1). See also uops.info/cache.html and en.wikichip.org/wiki/intel/microarchitectures/ice_lake_(client).
This means that there are 64 sets, which is the same as in previous microarchitectures.