Search code examples
mysqlperformancereplication

Deleting old records from a mysql table but keeping a backup


I have a MYSQL table that only requires data upto 30 days prior to todays date. It has data that can be upto a few years after todays date. For faster querying, I usually delete the old records since I see no point of searching through the old records. However, I still maintain a backup copy of the records if we ever needed it for analytics. The original table is this:

CREATE TABLE featured_deal (
     id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
      fsa VARCHAR(10),
      poster_id int(11),
      dealid bigint(20),
      bookedDate date,
      createDate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    UNIQUE KEY `featured_unique`(fsa, bookedDate)
    )

And I create a table that is a replica of this table called history:

CREATE TABLE featured_deal_history (
             id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
              fsa VARCHAR(10),
              poster_id int(11),
              dealid bigint(20),
              bookedDate date,
              createDate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
            UNIQUE KEY `featured_unique`(fsa, bookedDate)
            )

I have a trigger to populate the history table whenever an insert happens on the original table:

CREATE TRIGGER `featured_to_history`
            AFTER INSERT ON lst_enmasse_featured_deal 
            FOR EACH ROW
            INSERT INTO lst_enmasse_featured_deal_history (fsa,poster_id,dealid,bookedDate,createDate)
            VALUES (NEW.fsa,NEW.poster_id,NEW.dealid,NEW.bookedDate,NEW.createDate)

Finally, I clean the table using a cron job and the command:

    DELETE * FROM featured_deal WHERE bookedDate < DATE_SUB(CURDATE(), INTERVAL 30 DAY)

Is there a better way to perform the above task? I thought about MYSQL Partitions. However, I don't have a fixed partition. The date changes and hence I'd need two new partitions everyday.


Solution

  • In principle your aproach is OK, however the concept is predicated on the idea that a smaller table is more performant. This imlpies that your queries are running full table scans against the data - i.e. you've not configured your indexes correctly.

    I suggest that the first thing you fix is the performance of your queries.

    If you still need to keep stuff out of the hot data table, then you should seek to do any inserts into the history table as bulk operation NOT a row at a time - this will keep the table and indexes in a healthy state. This could be done in a batch operation as suggested by Cristian, or you could use a stochastic method (with a state variable in the source table). e.g. something like...

    AFTER INSERT ON mytable
    IF (0.95<RAND()) THEN
       UPDATE mytable SET control=CONNECTION_ID()
       WHERE control IS NULL;
       INSERT INTO backuptable (...)
         SELECT .... FROM mytable
         WHERE control=CONNECTION_ID();
       UPDATE mytable SET control=-1
       WHERE control=CONNECTION_ID();
     END IF;
    

    One further consideration is that you generate a new id when you write to the history table: why?