Search code examples
hadoopmapreducehbasehdfsapache-pig

Load data in HBase from HDFS without using Pig Script


I have .csv files in HDFS. I want to load these in HBASE tables without using Pig script.

Is there any other way available?


Solution

  • There might be several ways. But some of the options are like below.

    Option 1: Simple way is ImportTsv

    ImportTsv is a utility that will load data in TSV format into HBase. It has two distinct usages: loading data from TSV format in HDFS into HBase via Puts, and preparing StoreFiles to be loaded via the completebulkload.

    To load data via Puts (i.e., non-bulk loading):

    $ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c <tablename> <hdfs-inputdir>
    

    To generate StoreFiles for bulk-loading:

    $ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=hdfs://storefile-outputdir <tablename> <hdfs-data-inputdir>
    

    These generated StoreFiles can be loaded into HBase via Section 14.1.10, “CompleteBulkLoad”.

    Example hbase> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="c1,c2,c3...." hdfs://servername:/tmp/yourcsv.csv

    Option 2 : Custom map-reduce way

    Write a mapreduce program and csv parser in case you need to parse the csv which is complex

    see example here