Search code examples
javahadoophbase

Difference between plain Java program and MapReduce java program on HBase table


I am new to Hadoop and Hbase . I wanted to know the difference between writing plain java program and MapReduce program (written in java), when both perform the same task on the same data in the HBase table.

I know that Pig Scripts and Hive Queries will be converted into MapReduce program and will process the data on HDFS. Even HBase stores data on the Datanode. So whether the normal java program will be converted to mapper redcucer tasks and process the data from the datanode as batch process or it will be working on the data linearly?

Kindly let me know, how plain java program works on the HBase Table to process the data? Thanks in advance!!!


Solution

  • I wanted to know the difference between writing plain java program and MapReduce program (written in java), when both perform the same task on the same data in the HBase table.

    Hbase has many clients out of which we can write standalone hbase client with java or Mapreduce(written in java) program

    • In general small java hbase client is for small data. which wont convert to map-reduce. it will work as standalone client and wont spawn across the hadoop cluster nodes, and is for testing purpose.

    • Mapreduce is for big/huge dataset which uses YARN & divides the task across all nodes based on input splits(parallelism). so it works quicker than plain java program.

    both plain java or mapreduce program uses same client api & hbase.zookeeper.quorum, but the way it works is different.

    how plain java program works on the HBase Table to process the data?

    using client api it connects via zookeeper(hbase.zookeeper.quorum & ) and will interact with hbase table. for example config please see below.

     Configuration conf = HBaseConfiguration.create();
         conf.set("hbase.master","121.33.6.94:60000");
         Configuration config = HBaseConfiguration.create();
         config.set("hbase.zookeeper.quorum", "121.33.6.94");
         config.set("hbase.zookeeper.property.clientPort", "2181");
         config.set("hbase.master", "121.33.6.94:60000");
         config.set("zookeeper.znode.parent", "/hbase-unsecure");
    

    you can think it as how hive interacts using jdbc api but in a different way.