caching distributed-computing cpu-architecture

Contention for read shared data in memory?

I'm currently working my way through Computer Architecture: A Quantitative Approach by Hennessy and Patterson In Chapter 5 (Thread-Level Parallelism), they discuss cache coherence and replication for multiprocessing. They ask us to make the following assumptions by setting up a use case:

A few pages earlier in my textbook, they tell readers to make the following assumption:

Processor A writes to memory location X
Processor A writes to memory location Y.
Processor C reading from memory location Y will see the correct value- this implies that Processor C will also see the correct value of memory location X.

The logical conclusion is that

These restrictions allow processors to reorder reads, but forces the processor to finish a write in program order.

However, a few paragraphs later, when discussing replication as a scheme for enforcing coherence, they say

Replication reduces both latency of access and contention for a read shared data item.

My interpretation is that replicating data to local caches allows multicore processors to reduce latency (because of data locality - the data is significantly closer to the processor). I agree with that portion. However, I'm unclear as to why there is contention for a read shared data item. That seems to imply a RAR (Read after Read) data hazard, which I know does not really exist.

Unless processors are attempting writes to a shared memory location, why would there be any sort of contention in reading a shared data item?

Edit: There's plenty of posts on StackOverflow about thread contention, including What is thread contention?. But these almost exclusive use locks as an example. My understanding is that locks are a higher-level application pattern for enforcing coherence. Moreover, all the examples I see as answers involve some sort of modification (write) of the target data item.

Solution

Any memory structure has a limited number of access or read ports, which represent the physical wires that allow you read or write data. If there is only one read port but multiple agents may read from memory at the same time, then they will contend on the port because only one of them can use it at a time. One of the ways to reduce contention and improve overall bandwidth is by replicating the data in separate physical structures with separate access ports. For example, each core may have its own private caches where multiple copies of the same cache line may exist. In such design, each core would be able to access its copy of the cache line independently.