Search code examples
amazon-web-servicesamazon-s3consistencyeventual-consistencydata-consistency

AWS S3 Eventual Consistency and read after write Consistency


Help me to understand better these concepts that i can't grasp fully.

Talking about aws S3 consistecy models, i'll try to explain what i grasped.

Demistify or confirm me these claims please.

first of all

  • talking about "read after write" is related only to "new writings"/creation of objects that didn't exist before.
  • talking about "eventual consistency" is related to "modifying existing objects" (updating or deleting)

are these first concepts correct? then,

  • eventual consistency: a "client" who accesses to a datum before this has been completely written on a node, can read an old version of the object because the writing can be still in progress e the object might not has been commetted. This is a behavior universally tolerated in distributed systems where this type consistency is preferred to the other option of waiting for some sort of lock being removed when the object has been committed.

  • read after write consistency: the objects are immediately available to the client and the client will read the "real" version of the object, never an old version, and if i've understood well this is true only for new object.

If so, why these replication methods are so differen? and produce this differnet consistency?

The concept of "eventual consistency" is more natural to grasp, because you have to consider the "latency" to propagate the data to different nodes and a client might access during this time and getting no fresh data yet.

But why "read after write" should be immediate? to propagate a modification on an existing datum, or create a new datum, should have the same latency. I can't understood the difference.

Can you please tell me if my claims are correct, and explain in different way this concept.


Solution

  • talking about "read after write" is related only to "new writings"/creation of objects that didn't exist before.

    Yes

    talking about "eventual consistency" is related to "modifying existing objects" (updating or deleting)

    Almost correct, but be aware of one caveat. Here is a quote from the documentation:

    The caveat is that if you make a HEAD or GET request to a key name before the object is created, then create the object shortly after that, a subsequent GET might not return the object due to eventual consistency.

    Regarding to why they offer different consistency models, here is my understanding/speculation. (Note: the following content might be wrong since I've never worked for S3 and don't know the actual internal implementation of it.)

    S3 is a distributed system, so it's very likely that S3 uses some internal caching service. Think of how CDN works, I think you can use the similar analogy here. In the case where you GET an object whose key is not in the cache yet, it's a cache miss! S3 will fetch the latest version of the requested object, save it into the cache, and return it back to you. This is the read-after-write model.

    On the other hand, if you update an object that's already in the cache, then besides replicating your new object to other availability zones, S3 needs to do more work to update the existing data in the cache. Therefore, the propagation process will likely be longer. Instead of letting you wait on the request, S3 made the decision to return the existing data in the cache. This data might be an old version of this object. This concludes the eventual consistency.

    As Phil Karlton said, there are only two hard things in Computer Science: cache invalidation and naming things. AWS has no good ways to fully get around with this, and has to make some compromises too.