Search code examples
mysqlamazon-web-servicesrds

AWS RDS - automatic backup vs snapshot with MyISAM tables


I have an AWS RDS MySQL 5.7 database with MyISAM tables that I would like to migrate to another RDS in a custom VPC, and once migrated, convert those MyISAM tables to InnoDB. If I undertood correctly, the only way to create a correct automatic backup is using the following procedure explained here: "Automated Backups with Unsupported MySQL Storage Engines" https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html#Overview.BackupDeviceRestrictions

  1. Stop all activity to your MyISAM tables (that is, close all sessions). You can close all sessions by calling the mysql.rds_kill command for each process that is returned from the SHOW FULL PROCESSLIST command.
  2. Lock and flush each of your MyISAM tables
  3. Create a snapshot of your DB instance. When the snapshot has completed, release the locks and resume activity on the MyISAM tables

Has someone done this procedure before? How is that the snapshots are being created successfully every night from the current RDS DBInstance, even though it contains MyISAM tables?

Thanks!


Solution

  • The problem isn't with snapshot creation. It's what can go wrong when you actually try to use one of the snapshots.

    RDS snapshots work by capturing a snapshot if your RDS instance's underlying EBS volume (you can't see this volume, but it's there -- RDS runs on EC2, with "hidden" instances and volumes).

    EBS snapshots capture the entire contents of the hard drive exactly as they happened to exist at the moment in time when the snapshot process starts.

    What ends up on the snapshot is essentially the same thing that you would have on a MySQL Server if you executed sudo killall -9 mysqld -- it is as if the server had halted everything, immediately, without doing any of the things it normally does to clean up for a graceful shutdown. With RDS, things are not quite that dramatic, because RDS does take some precautions, but fundamentally, this is the nature of what is happening.

    When you create an RDS instance from a snapshot, the first thing that happens when the instance starts up is the same thing your hypothetical server would do when you restarted the killed MySQL Server daemon: InnoDB Crash Recovery.

    InnoDB Crash Recovery

    To recover from a MySQL server crash, the only requirement is to restart the MySQL server. InnoDB automatically checks the logs and performs a roll-forward of the database to the present. InnoDB automatically rolls back uncommitted transactions that were present at the time of the crash.

    https://dev.mysql.com/doc/refman/5.7/en/innodb-recovery.html#innodb-crash-recovery

    Crash recovery is InnoDB's mechanism for bringing everything back into harmony in it internal data structures and ensure that all data is intact, exactly as your application left it. It's possible because InnoDB is a transactional storage engine. That means a lot of different things, but what it specifically means in this case is that InnoDB doesn't just change table data when you change a table. It goes through a process that can be simplified something like this:

    • store the proposed changes to disk¹
    • actually make the changes
    • mark the changes as complete

    What this means is that until the changes are finalized, InnoDB can be interrupted and will subsequently be able to pick up where it left off, without corrupting or losing data.

    MyISAM has no such mechanisms. It just writes to the data files, directly. Even if a MyISAM table isn't actively being used, it may still need to be repaired when the server comes up, to clean up its structures. In some circumstances, repairing the table can be impossible, and all or part of the data in the table will be lost.

    If your MyISAM tables are flushed and locked when the snapshot occurs, they are in a quiescent state on the disk, as though the server had actually been gracefully shut down before the snapshot had occurred, so they will be stable on the snapshot.

    But the snapshot process will always appear to succeed, because the snapshot is just duplicating whatever is on the disk, as it appears at the moment in time when the snapshot gets underway.

    The problem is that what the snapshot captured may not be usable, and you have no way of knowing whether the snapshot will be fully viable.


    ¹ Note that the first step, "store the proposed changes to disk" is related to the system variable innodb_flush_log_at_trx_commit which makes the system slower if set to 1 but also is the safest setting, because your query doesn't actually succeed until that first step is done. A setting of 2 is still reasonably safe, because it still writes the changes but continues without requiring that the operating system confirm that they have actually been written to the hard drive before your query returns success... but in a crash, a transaction your application thinks was committed may or may not have survived.