Load data in HBase from HDFS without using Pig Script

I have .csv files in HDFS. I want to load these in HBASE tables without using Pig script.

Is there any other way available?

Solution

There might be several ways. But some of the options are like below.

Option 1: Simple way is `ImportTsv`

ImportTsv is a utility that will load data in TSV format into HBase. It has two distinct usages: loading data from TSV format in HDFS into HBase via Puts, and preparing StoreFiles to be loaded via the completebulkload.

To load data via Puts (i.e., non-bulk loading):

$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c <tablename> <hdfs-inputdir>

To generate StoreFiles for bulk-loading:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=hdfs://storefile-outputdir <tablename> <hdfs-data-inputdir>

These generated StoreFiles can be loaded into HBase via Section 14.1.10, “CompleteBulkLoad”.

Example hbase> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="c1,c2,c3...." hdfs://servername:/tmp/yourcsv.csv

Option 2 : Custom map-reduce way

Write a mapreduce program and csv parser in case you need to parse the csv which is complex

see example here

Load data in HBase from HDFS without using Pig Script

Option 1: Simple way is ImportTsv

Option 2 : Custom map-reduce way

Option 1: Simple way is `ImportTsv`