Search code examples
hadoophbaseflume

Flume to migrate data from MySQL to Hadoop


Please share your thoughts.

The requirement is to migrate the data in MySQL db to Hadoop/HBase for analytic purposes.

The data should be migrated real time or near real time. Can flume support this.

What can be a better approach.


Solution

  • The direct answer to your question is yes. Flume is designed as a distributed data transport and aggregation system for event/log structured data. If set up "correctly" flume can push data for continuous ingestion in Hadoop. This is when Flume is set up correctly to collect data from various sources (in this case MySql) and I am sure if data is available at source, the sink in Flume will sync it to HDFS at millisecond level. Once data is available at HDFS/HBASE you can run queries on it and can be processed depend on infrastructure.

    So I would say the Flume configuration is very important to push data in near real time to HDFS and then the rest depends on your Map/Reduce cluster performance and how the queries are written with regard to the data being processed.

    I also found the following resource for you to understand using Flume and HDFS: http://assets.en.oreilly.com/1/event/61/Real-time%20Streaming%20Analysis%20for%20Hadoop%20and%20Flume%20Presentation.pdf