Search code examples
mysqlstreaminnodb

How to create an event stream on MySQL


Given the following InnoDB table:

event_stream
+ id BIGINT PRIMARY AUTO-INCREMENT
+ event TEXT

Considering that there are several clients concurrently inserting events into this stream: What technique could we use so that this event stream could be processed in an incremental way by multiple listeners/consumers?

(edit) I.E. I would like to have multiple clients attached to this stream that can react to new events once and keep track of their position on the stream.

Considerations:

  • Not using MySQL to store the events is not an option;
  • locking the entire table is not acceptable;
  • I want to keep the control of wether an event was already seem/not seem up to the consumer, there might be multiple consumers for this table;
  • Creating new columns is acceptable;
  • This table will grow to hundreds of millions of events;

Solution

  • "Don't queue it, just do it." I have found that a database makes a poor queuing mechanism. If the 'worker' threads don't take long to perform the tasks, then have the queuers simply perform the tasks; this removes the overhead of the queue, thereby possibly making the system faster and scale better.

    "Hundreds of millions of events" -- and nearly all of them have been "handled"? This suggests you have two tables -- one for handled events, one for events waiting to be handled. The latter would rarely have more than a few dozen rows?? In that case, the processing will work better.

    Have two extra columns: which worker owns the process, and when the worker grabbed it. The time is so that you can take care of the case (yes, eventually it will happen) of a worker grabbing a task, then dying -- thereby leaving the task orphaned. As separate job can 'reap' these orphans.

    A single-SQL UPDATE can grab one row in the table. Do this in a transaction by itself, not in any transaction(s) in the process. Similarly 'release' the task in its own transaction.

    The grab is something like this (with autocommit=ON):

    UPDATE ToDo SET who = $me, when = NOW()
        WHERE who IS NULL
        LIMIT 1;   -- grab one
    SELECT ... FROM ToDo WHERE who = $me;  -- get details on the task
    

    The 'release' probably involves both tables, something like this:

    BEGIN;
    $stuff = SELECT ... FROM ToDo WHERE who = $me;
    DELETE FROM ToDo WHERE who = $me;
    INSERT ... INTO History ... VALUES (most of stuff from $stuff);
    COMMIT;
    

    In between grabbing and releasing, you have as long as you need to perform the 'task'. You won't be tripped up by an InnoDB timeout, etc.

    If you would like to give further details about your queue and tasks, I may have further refinements.

    What I describe should handle any number inserters, any number of workers, and tasks lasting any length of time.

    AUTO_INCREMENT is not reliable for walking through an event list. An INSERT is multiple steps:

    1. Start transaction
    2. get next auto_incr id
    3. do the insert
    4. COMMIT -- only now do others see the new id

    It is possible (especially in replication) for the COMMITs to be "out of order" relative to the auto_incr.