Search code examples
mysqlviewsynchronizationrdsamazon-redshift

Synchronize data from MySql to Amazon RedShift


We do some aggregation on huge datasets in Amazon RedShift, and we have some relatively small amount of data in MySQL. For some of the joins in RedShift we need the data in MySQL. What is the best way to synchronize the MySql data to RedShift? Is there such a thing in redshift like the remote view in oracle? Or should I programatically query MySql and insert / update in RedShift?


Solution

  • When MySQL data is required for joins in Redshift, we usually just send it over from one to another.

    It implies:

    1. Redshift: Creating an analogous table schema (bearing in mind Redshift/PSQL's particularities)
    2. MySQL: Dumping the data table (in csv format)
    3. Zipping the export, and sending it to S3
    4. Redshift: Truncating the table, and importing all data using COPY

    Steps 2 to 4 can be scripted, and allow you to send fresh data over to Redshift when necessary or regularly.