Search code examples
hadoopsqoopbigdata

How to move data from RDBMS to hadoop without Sqoop?


I need to move huge data from RDBMS to Hadoop without using Sqoop. I have database of 2200 tables and using Sqoop to import them to hdfs is a hectic job consuming alot of time and hitting the database to select each time effect the performance. I have more sources to move from RDBMS to hdfs. And i query the files in hdfs with hive. Can someone help me with a more efficient way ?


Solution

  • You could always do it maually with any back-end code: read data from database and streaming write to HDFS.
    Then in you application configuration you could have any customization you need (threads, timeouts, data batches amount, etc.). And this is rather straightforward solution.
    We've tried this once for some reason I don't remember. But mostly we use sqoop and have no issues here.
    You could also do a copy (sime kind of replica) of database, which would not be used by any external systems other than your sqoop job. So user selects would not affect performance.