Search code examples
phpalgorithmormcroniteration

iterate an actively used database table in cron jobs without missing any row


How would you correctly iterate all the rows of a database query when doing multiple passes of a long operation without missing any row? In my case i have alphanumeric IDs, so id column is not reliable...

The main problem is that the query results may change between each cron task execution.

A first idea could be to store the offset / "number of rows processed". But if one of the already processed rows is deleted the others would shift and the first one would be skipped.

If I store the id of the last processed row and skip to the one after it I have a worse problem: if that exact row is deleted the next cron job will skip every remaining row.

Is there any more "standard" way of iterating a table like this ?

I'm doing this in php with a third party orm engine using sql as data store, but i think it's more a general question so anyone feel free to retag this more appropriately.


Solution

  • Most DBMS have a reliable standard sorting. So the easiest solution would be not to use any sorting and use LIMIT and OFFSET definitions. E.g in the case of MySQL you could do something like this:

    SELECT *
    FROM mytable
    LIMIT 10,10
    ;
    

    This should be reliable.

    If you would like to take deleted rows into consideration, I would use some sort of queue. E.g.:

    CREATE TABLE item(
        id INT PRIMARY KEY
    );
    
    CREATE TABLE job (
        id INT PRIMARY KEY
    );
    
    CREATE TABLE job_queue(
        id INT PRIMARY KEY,
        job_id INT,
        item INT,
        FOREIGN KEY job_id REFERENCES job(id),
        FOREIGN KEY item REFERENCES item(id)
    );
    

    Then you could add all items to the queue by running:

    INSERT INTO job_queue (job_id, item_id)
    SELECT 1, id
    FROM item;
    

    Now you could fetch the items reliably by running the query:

    SELECT *
    FROM job_queue
    WHERE job_id = 1
    LIMIT 10;
    
    DELETE FROM job_queue WHERE id IN (
        SELECT id
        FROM job_queue
        WHERE job_id = 1
    )
    

    But there are many possible valid solutions. A final answer heavily depends on your requirements.