in intel's manual, the following section confuse me:
11.5.6.2 Shared Mode In shared mode, the L1 data cache is competitively shared between logical processors. This is true even if the logical processors use identical CR3 registers and paging modes. In shared mode, linear addresses in the L1 data cache can be aliased, meaning that one linear address in the cache can point to different physical locations. The mechanism for resolving aliasing can lead to thrashing. For this reason, IA32_MISC_ENABLE[bit 24] = 0 is the preferred configuration for processors based on the Intel NetBurst microarchitecture that support Intel Hyper-Threading Technology.
as intel use VIPT(equals to PIPT) to access cache.
how cache aliasing would happened ?
Based on Intel® 64 and IA-32 Architectures Optimization Reference Manual, November 2009 (248966-020), Section 2.6.1.3:
Most resources in a physical processor are fully shared to improve the dynamic utilization of the resource, including caches and all the execution units. Some shared resources which are linearly addressed, like the DTLB, include a logical processor ID bit to distinguish whether the entry belongs to one logical processor or the other.
The first level cache can operate in two modes depending on a context-ID bit:
- Shared mode: The L1 data cache is fully shared by two logical processors.
- Adaptive mode: In adaptive mode, memory accesses using the page directory is mapped identically across logical processors sharing the L1 data cache.
Aliasing is possible because the processor ID/context-ID bit (which is just a bit indicating which virtual processor the memory access came from) would be different for different threads and shared mode uses that bit. Adaptive mode simply addresses the cache as one would normally expect, only using the memory address.
Specifically how the processor ID is used when indexing the cache in shared mode appears not to be documented. (XORing with several address bits would provide dispersal of indexes such that adjacent indexes for one hardware thread would map to more separated indexes for the other thread. Selecting a different bit order for different threads is less likely since such would tend to increase delay. Dispersal reduces conflict frequency given spatial locality above cache line granularity but less than way-size granularity.)