Search code examples
mysqlamazon-web-servicescronrds

MySQL dump update every night to local VM


Why

My company logs every request that comes to PHP to our database in AWS because we have found it be extremely helpful when tracking down errors in our system. Unfortunately we get about 200,000 requests per day and each request inserts to 3-5 tables (depending if the request had an error or not), so the table storing our logs is cleared out regularly.

However, we would like to keep the log data longer and the best way to do that is to start storing it locally, rather than on AWS. But having PHP connect locally for every request isn't viable. (Would cause MAJOR system slowdowns)

How

I have decided the best way to handle this is to basically dump the past 24 hours of data for the log tables only to the local machine and then clear the data that was dumped from production. Production would never have more than 24 hours worth of data, and we could keep the data for up to 6 months locally.

Problem

Ideally, I would rather not write a PHP script to transfer large amounts of data between 2 databases. I think it's possible to use mysqldump to dump an update for a specific table from 1 database to another, but I have no idea how. (Same thing for bin logs)

My Question: How do I use either mysqldump or bin logs to only add rows from a specific table to a local database from AWS?


Solution

  • To use binlogs, you could download all binlogs to your local instance and convert them to SQL using mysqlbinlog, and use that to load them into your local database. See https://dev.mysql.com/doc/refman/5.7/en/mysqlbinlog.html

    Then you can truncate your tables on production without adding the truncate table to the binlog. That way your tables in production will be truncated, but when you replay the binlog against your local MySQL, the local MySQL won't be truncated.

    SET SESSION sql_log_bin=0;
    TRUNCATE TABLE <tablename>;
    SET SESSION sql_log_bin=1;
    

    But this is risky because if you forget to skip binlog for the TRUNCATE TABLE statement even one day, it will truncate all data from that table in your local MySQL!

    If you can't omit TRUNCATE TABLE from the binlog, you could use sed to filter out the TRUNCATE TABLE statement as you replay the binlog locally.

    mysqlbinlog <binlogs> | sed -e '/^TRUNCATE TABLE/d' | mysql ...
    

    That's just an example. You might need to be more thorough with the pattern. I haven't tested it.

    To mysqldump, you can dump data only, without the DROP TABLE / CREATE TABLE statements that are usually added to the dump.

    mysqldump --single-transaction --no-create-info mydatabase mytable ...
    

    Then the dump should only include a lot of INSERT statements, and you can replay that against your local MySQL. It will accumulate more and more data as you insert the dump from each day.