I have to build a tool which will process our data storage from HBase(HFiles) to HDFS in parquet format.
Please suggest one of the best way to move data from HBase tables to Parquet tables.
We have to move 400 million records from HBase to Parquet. How to achieve this and what is the fastest way to move data?
Thanks in advance.
Regards,
Pardeep Sharma.
Please have a look in to this project tmalaska/HBase-ToHDFS which reads a HBase table and writes the out as Text, Seq, Avro, or Parquet
hadoop jar HBaseToHDFS.jar ExportHBaseTableToParquet exportTest c export.parquet false avro.schema