Search code examples
hadoophivehdfsreplication

Fast HDFS and Hive data replication


I'm considering data repication between clusters for 2 use cases :

  1. DR (so replication between 2 data centers
  2. Sync between 2 production clusters

For first one, I'd tend to think Falcon is the right option. But for second one, I want to replicate data as sson as it is available (means end of put for HDFS, and end of table creation for Hive). What would be your view on this ?


Solution

  • Just discovered ReAir https://github.com/airbnb/reair

    Seems a good tools to look at. :)