Given the following InnoDB table:
event_stream
+ id BIGINT PRIMARY AUTO-INCREMENT
+ event TEXT
Considering that there are several clients concurrently inserting events into this stream: What technique could we use so that this event stream could be processed in an incremental way by multiple listeners/consumers?
(edit) I.E. I would like to have multiple clients attached to this stream that can react to new events once and keep track of their position on the stream.
Considerations:
"Don't queue it, just do it." I have found that a database makes a poor queuing mechanism. If the 'worker' threads don't take long to perform the tasks, then have the queuers simply perform the tasks; this removes the overhead of the queue, thereby possibly making the system faster and scale better.
"Hundreds of millions of events" -- and nearly all of them have been "handled"? This suggests you have two tables -- one for handled events, one for events waiting to be handled. The latter would rarely have more than a few dozen rows?? In that case, the processing will work better.
Have two extra columns: which worker owns the process, and when the worker grabbed it. The time is so that you can take care of the case (yes, eventually it will happen) of a worker grabbing a task, then dying -- thereby leaving the task orphaned. As separate job can 'reap' these orphans.
A single-SQL UPDATE
can grab one row in the table. Do this in a transaction by itself, not in any transaction(s) in the process. Similarly 'release' the task in its own transaction.
The grab is something like this (with autocommit=ON
):
UPDATE ToDo SET who = $me, when = NOW()
WHERE who IS NULL
LIMIT 1; -- grab one
SELECT ... FROM ToDo WHERE who = $me; -- get details on the task
The 'release' probably involves both tables, something like this:
BEGIN;
$stuff = SELECT ... FROM ToDo WHERE who = $me;
DELETE FROM ToDo WHERE who = $me;
INSERT ... INTO History ... VALUES (most of stuff from $stuff);
COMMIT;
In between grabbing and releasing, you have as long as you need to perform the 'task'. You won't be tripped up by an InnoDB timeout, etc.
If you would like to give further details about your queue and tasks, I may have further refinements.
What I describe should handle any number inserters, any number of workers, and tasks lasting any length of time.
AUTO_INCREMENT
is not reliable for walking through an event list. An INSERT
is multiple steps:
It is possible (especially in replication) for the COMMITs to be "out of order" relative to the auto_incr.