Search code examples
mysqldatabaseamazon-redshiftdata-migrationdata-pipeline

What is the best way to automate replication of RDS (MySQL) schema to AWS Redshift?


We use ruby scripts to migrate data from MySQL to Redshift(PostgreSQL).Currently we use YAML configuration files to maintain schema information (column names and types).So whenever a MySQL table is altered, we need to manually change the YAML files.Now, we are thinking of moving YAML configurations to database(PSQL) as JSON object. But still we need to manually change the schema configurations.We want to automate the process.(Preferably using Ruby.)

PS: We use bin-log to incrementally update data in AWS Redshift.


Solution

  • There are a couple of possible solutions for you.

    The first one is to use the AWS Data Pipeline. It's a service that moves data between different AWS compute and storage services. (https://aws.amazon.com/datapipeline/)

    Another option is to use a data integration tool such as Alooma. It can replicate tables from MySQL database hosted on Amazon RDS to a AWS Redshift in near real time.

    It supports the bin-log replication method you're using.

    Follow this steps to replicate your data:

    Setting up bin-log for hosted MySQL:

    To replicate MySQL tables Alooma needs row-based replication to be set up to be able to read your MySQL events.

    1. Verify that your binlog is configured by running the statement below - the value returned should be ROW:

      mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_format';

    2. Verify that your binlog row format is set to FULL:

      mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_row_image';

    3. Verify that your log slave updates option is set to ON:

      mysql> SHOW GLOBAL VARIABLES LIKE 'log_slave_updates';

    If your MySQL is not currently set to row based replication, you'll need to set it up.

    1. Include the following lines in your my.cnf file (under /etc/mysql/). The sample below is for a "vanilla" MySQL 5.6 installation:

      [mysqld] log-bin=mysql-bin server-id=1 binlog_format = ROW log-slave-updates=true

    2. Make sure to restart your MySQL server to pick up the changes.

    3. Create an Alooma user, similar to the MySQL slave user:

      mysql> CREATE USER 'alooma'@'%' IDENTIFIED BY 'YOUR_PASSWORD';

      mysql> GRANT REPLICATION SLAVE, SELECT, RELOAD, REPLICATION CLIENT, LOCK TABLES, EXECUTE ON . TO 'alooma'@'%';

      mysql> FLUSH PRIVILEGES;