I understand that a MySQL transaction will let you perform multiple inserts/updates at once where either all or none of them will succeed and I understand how to use them.
What I am not sure about is how MySQL manages to hold all the data over long periods of time and what affect this might have performance.
I have a table of 100 people named "John" and I update every name to "Jane" in a loop inside a transaction, each update takes 1 second meaning it takes 100 seconds to finish the transaction.
If I make a select from another process during that 100 seconds the result will be "John" rather than "Jane". If I make a select after the transaction is committed it will return "Jane".
This is all fine and i'm not really confused about this works.
This is the more confusing bit.
I have a table of 100 people named "John" and I start a transaction in which I loop through and select each row one by one. Each select query takes 1 second so this takes 100 seconds.
After 50 seconds another process, not within a transaction, updates every row to "Jane".
In my first process, within the transaction, I will still receive "John" as a result even after the update to "Jane" has completed.
To be clear the timing would be like so:
So now I could execute SELECT name FROM names WHERE id = 31
at the exact same time and have one return "John" and one return "Jane" depending on whether I was in a transaction, or when the transaction started.
MySQL must then be storing the value of this field twice in some way.
I don't think it takes a copy of the database or table, since when you begin a transaction it doesn't know what tables you're going to touch. You may not touch a table until 10 minutes into the transaction and yet the data is at it was 10 minutes ago, no matter how many modifications other processes made in the mean time.
I've also experimented with databases and tables that are GB's in size and take minutes to dump, there's no way it's making entire copies.
Perhaps it temporarily holds the value of the field somewhere waiting for the transaction to finish?
It would then need to check if there's a pending value when performing a select.
Therefore doing SELECT name FROM names WHERE id = 31
would be the equivalent of something like:
// John
if (pending_value('names', 'name', 31) {
// Jane
name = get_pending_value('names', 'name', 31);
} else {
// John
name = get_db_value('name', 'name', 31);
}
That is obviously very dumb pseudo code, but it's essentially saying "is there a pending update? If yes, use that instead"
This would presumably be held in memory somewhere? Or perhaps a file? Or one of the system databases?
If my names
table had 1 billion rows and we performed the same queries then MySQL would simultaneously know that 1 billion rows held the value "John" and that 1 billion rows held the value "Jane". This must surely impact performance.
But is it the query within the transaction that is impacted or the query outside the transaction?
e.g.
Begin transaction
UPDATE names SET name = "Jane"
SELECT name FROM names WHERE id = 31
//JohnSELECT name FROM names WHERE id = 31
//JaneDoes the query in step (3) or step (4) have a performance impact or both?
Some clues:
COMMIT
or ROLLBACK
. (See "history list" in the documentation.) This is row-by-row, not whole table or database. It will not "escalate the row locks to a table lock".WHERE id = 31
.innodb_lock_wait_timeout
controls how long before giving up.autocommit=ON
, each statement is a transaction unto itself. (Your last example is missing a BEGIN, in which case Process 2 has 2 separate transactions.)In your first example, the isolation mode of read_uncommitted
would let you see the other transaction's changes as they happened. That is a rarely used mode. The other modes won't let you see the changes until they are COMMITted
, and it would never see the changes if it were ROLLBACK'd
. (Yes, there was a copy of each changed row.)
repeatable_read
mode (and others) effectively limit you to seeing only the rows with your transaction_id or older. Hence, even at 12:00:31, you still see "John".
General advice:
SELECT ... FOR UPDATE
where appropriate -- this adds a stronger lock on the rows in the SELECT
just in case they will be updated or deleted in the transaction.INSERT
adding 100 rows; that will be 10 times as fast as 100 single-row INSERTs
. (Similarly for UPDATE
and DELETE
.)SHOW ENGINE=InnoDB STATUS;
(I find it useful in dealing with deadlocks, but cryptic for other purposes.)