Search code examples
rdbmsaciddurability

Does a typical ACID RDBMS sync to disk every commit?


The 'D' in ACID means "Durability" which is defined by Wikipedia as: "every committed transaction is protected against power loss/crash/errors and cannot be lost by the system and can thus be guaranteed to be completed".

However, this would mean that every transaction must be sync'd to disk before being reported as successful, not just flushed. ('flush'=sent to the operating system level, 'sync'=sent to the physical disk platter). This would make it impossible to implement a high transaction rate RDBMS.

Do popular RDBMS's really sync every transaction?


Solution

  • Databases that use disk for persistence do indeed have to write to the disk to make the transaction durable. Moreover, they also have to sync to the disk to avoid any loss from a write back cache.

    To achieve high performance, databases will use group commits whereby multiple transactions in a commit cycle will use the same write/sync operation to make all of the transactions durable. This is possible where they are all appending to the same transaction log.

    This may mean that the response of an individual commit may be delayed (while waiting for others to join the commit cycle) but the overall throughput is much greater across the whole database because the cost of the write/sync is amortized across multiple transactions. For example, each individual transaction may take 10mS, but thousands of transactions are all able to commit in the same cycle.

    It is normal for a database to measure how many transactions are active to judge whether it is worth making any single transaction wait for others to join the commit cycle such that on a very lightly loaded system, a transaction need not wait for others.

    Not all databases use disk to guarantee durability. For example, VoltDB relies on copies of the transaction held in memory across multiple servers. If one of the servers dies, the transaction is still available elsewhere. Hence a transaction only needs to ensure that transaction has been transmitted to a sufficient population of servers to ensure durability.

    This also raises the question of what is durable? Is a single disk durable? Not if the disk fails. Is a RAID array durable? Not if there is a catastrophic RAID corruption. The only guarantee of durability is where transactions are replicated across multiple remote database instances - but not not everybody needs that level of guarantee. Durability should not be considered a binary option but rather as a choice of level of durability.