Isolation levels and not skipping any data

Say we want to read new events from our MySQL (InnoDB) table event. We remember the last (auto-increment) ID we have seen, and query WHERE id > @LastSeenId.

What if IDs 1 through 10 are available, except that ID 4 is yet uncommitted?

Clearly, it is important that we never skip any rows that may exist. If we were to skip ID 4, we will never see it again, and will have missed it permanently, which must be prevented.

I presume this depends on the isolation level the query runs in.

1. Do I understand correctly that Serializable (and no other level) will provide the required behavior?

That is, it will wait for commit/rollback of any uncommitted transactions affecting the rows matching the condition (id > @LastSeenId)?

2. More specifically, if we are not using database transactions explicitly, is the outcome determined by the default isolation level - even for a single SELECT query?

For context, we are using the official MySQL connector for .NET.

Solution

This is something you can test yourself with the mysql client and two windows.

Open window 1, get into the mysql client, create a table, and fill it with values that will be committed.

mysql1> use test;

mysql1> create table event (id serial primary key);

mysql1> insert into event values (1), (2), (3), (5);

Now start a transction. Insert value 4, as in your example.

mysql1> begin;

mysql1> insert into event values (4);

Do not commit the last insert yet.

Open window 2, get into the mysql client, set the transaction isolation for your session, and query for the range of data.

mysql2> use test;

mysql2> set tx_isolation = serializable;

mysql2> begin;

mysql2> select * from event where id >= 1;

The select hangs at this stage, waiting.

This is because in serializable level, all select queries implicitly try to acquire a shared lock, as if you had added the LOCK IN SHARE MODE (or FOR SHARE in MySQL 8.0 syntax) clause to the end of the select query. This is a locking read. It's trying to acquire a gap lock on the range of id values, but it can't get that lock yet, because there's an uncommitted row created by window 1, which falls in the range.

Now in window 1, commit the transaction (make sure you do this before window 2 times out after 50 seconds):

mysql1> commit;

Window 2 immediately returns, and now it sees the full set of data values.

mysql2> select * from event where id >= 1;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  5 |
+----+

Now try to insert a new row in Window 1:

mysql1> begin;

mysql1> insert into event values (6);

Now this window hangs. Why? Because it's trying to lock the row it's inserting, but window 2 still holds a gap lock on the range of rows where id > 1 — which includes the new value 6.

This is how MySQL ensures repeatable read. It uses gap locks to prevent insertion of new rows that would enter into the range it has locked, because a new row would affect the set of rows the current transaction is trying to preserve.

Eventually, window 1 will time out, if window 2 doesn't finish its transaction and release its gap lock:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

All of this has nothing to do with auto-increment. As you saw in my example inserts, I'm overriding the auto-increment anyway. It has only to do with locks on rows and gaps, regardless of how the values in those rows were generated.

I have to comment also that the problem you're trying to solve is essentially a publish/subscribe model, which is more suited to a message queue technology, not an RDBMS technology. You should consider using a message queue as a complementary technology to your RDBMS.