I have asked a similar question: Can a lower level cache have higher associativity and still hold inclusion?
Suppose we have 2-level of cache. (L1 being nearest to CPU (inner / lower-level) and L2 being outside that, nearest to main memory) can L1 cache be write back?
My attempt)
I think we must have only write through cache and we cannot have write back cache in L1. If a block is replaced in the L1 cache then it has to be written back to L2 and also to main memory in order to hold inclusion. Hence it has to be write through and not write back.
All these doubts arise from the below exam question. :P
Question) For inclusion to hold between two cache levels L1 and L2 in a multi-level cache hierarchy which of the following are necessary?
I) L1 must be write-through cache
II) L2 must be a write-through cache
III) The associativity of L2 must be greater than that of L1
IV) The L2 cache must be at least as large as the L1 cacheA) IV only
B) I and IV only
C) I, II and IV only
D) I, II, III and IV
As per my understanding, the answer needs to be Option (B)
Real life counterexample: Intel i7 series (since Nehalem) have a large shared (between cores) L3 that's inclusive. And all levels are write-back (including the per-core private L2 and L1d) to reduce bandwidth requirements for outer caches.
Inclusive just means that the outer cache tags have a state other than Invalid for every line in a valid state in any inner cache. Not necessarily that the data is also kept in sync. https://en.wikipedia.org/wiki/Cache_inclusion_policy calls that "value inclusion", and yes it does require a write-through (or read-only) inner cache. That's Option B, and is even stronger than just "inclusive".
My understanding of regular inclusion, specifically in Intel i7, is that data can be stale but tags are always inclusive. Moreover, since this is a multi-core CPU, L3 tags tell you which core's private L2/L1d cache owns a line in Exclusive or Modified state, if any. So you know which one to talk to if another core wants to read or write the line. i.e. it works as a snoop filter for those multi-core CPUs.
And conversely, if there are no tag matches in the inclusive L3 cache, the line is definitely not present anywhere on chip. (So an invalidate message doesn't need to be passed on to every core.) See also Which cache mapping technique is used in intel core i7 processor? for more details.
To write a line, the inner cache has to fetch / RFO it through the outer cache so it has a chance to maintain inclusion that way as it handles the RFO (read for ownership) from the L1d/L2 write miss (not in Exclusive or Modified state).
Apparently this is not called "tag-inclusive"; that term may have some other technical meaning. I think I saw it used and made a wrong(?) assumption about what it meant. What is tag-only forced cache inclusion called? suggests "tag-inclusive" doesn't mean tags but no data either.
Having a line in Modified state in the inner cache (L1) means an inclusive outer cache will have a tag match for that line, even if the actual data in the outer cache is stale. (I'm not sure what state caches typically use for this case; according to @Hadi in comments it's not Invalid. I assume it's not Shared either because it needs to avoid using this stale data to satisfy read requests from other cores.)
When the data does eventually write back from L1, it can be in Modified state only in the outer cache, evicted from L1.