database distributed distributed-caching

Why is a distributed in-memory cache faster than a database query?

https://medium.com/@i.gorton/six-rules-of-thumb-for-scaling-software-architectures-a831960414f9 states about distributed caches:

Better, why query the database if you don’t need to? For data that is frequently read and changes rarely, your processing logic can be modified to first check a distributed cache, such as a memcached server. This requires a remote call, but if the data you need is in cache, on a fast network this is far less expensive than querying the database instance.

The claim is that a distributed in-memory cache is faster than querying the database. Looking at Latency Numbers Every Programmer Should Know, it shows that the latencies of the operations compare like this: Main memory reference <<< Round trip within same datacenter < Read 1 MB sequentially from SSD <<< Send packet CA->Netherlands->CA.

I interpret a network call to access the distributed cache as "Send packet CA->Netherlands->CA" since the cached data may not be in the same datacenter. Am I wrong? Should I assume that replication factor is high such that cached data should be available in all datacenters and instead the comparison between a distributed cache and a database is more like "Round trip within same datacenter" vs "Read 1 MB sequentially from SSD"?

Solution

Databases typically require accessing data from disk, which is slow. Although most will cache some data in memory, which makes frequently run queries faster, there are other overheads such as:

query parsing and syntax checking
database permission/authorisation checking
data access plan creation by the optimizer
a quite chatty protocol, especially when multiple rows are returned

All of which add latency.

Caches have none of these overheads. In the general case, there are more reads than writes for caches, plus caches always have a value available in memory (if not a cold hit) - writing to the cache doesn't stop reading the current value - synchronised writes just mean a slight delay between the write request and the new value being available (everywhere).