Search code examples
postgresqlmvcc

Try to understand MVCC


I'm trying to understand MVCC and can't get it. For example. Transaction1 (T1) try to read some data row. In that same time T2 update the same row.

The flow of transaction is

T1 begin -> T2 begin -> T2 commit -> T1 commit

So the first transaction get it snapshot of database and returns to user result on which he is gonna build other calculation. But as I understand, customer get the old data value? As I understand MVCC, after T1 transaction begins, that transaction doesn't know, that some other transaction change data. So if now user doing some calculation after that (without DB involved), he is doing it on wrong data? Or I'm not right and first transaction have some mechanisms to know that row was changed?

Let's now change the transaction flow.

T2beg -> T1beg -> T2com -> T1com

In 2PL user get the newer version of data because of locks (T1 must wait before exclusive lock released). But in case of MVCC it still will be the old data, as I understand the postgresql MVCC model. So I can get stale data in exchange for speed. Am I right? Or I miss something?

Thank you


Solution

  • Yes, it can happen that you read some old data from the database (that a concurrent transaction has modified), perform a calculation based on this and store “outdated” data in the database.

    This is not a problem, because rather than the actual order in which transactions happen, the logical order is more relevant:

    If T1 reads some data, then T2 modifies the data, then T1 modifies the database based on what it read, you can say that T1 logically took place before T2, and there is no inconsistency.

    You only get an anomaly if T1 and T2 modify the same data: T1 reads data, T2 modifies the data, then T1 modifies the same data based on what it read. This anomaly is called a “lost update”.

    Lost updates can only occur with the weakest (which is the default) isolation level, READ COMMITTED.

    If you want better isolation from concurrent activity, you need to use at least REPEATABLE READ isolation. Then T1 would receive a serialization error when it tries to update the data, and you would have to repeat the transaction. On the second attempt, it would read the new data, and everything will be consistent.