MAPREDUCE - BULK LOADING DATA INTO HBASE TABLE

Why do we use only a driver class and a mapper class and we don't use o reducer class?

Solution

Reducer can be used if you want to do any aggregations on the bulk loaded data.

In normal case if you are just loading with out any aggregations. then mapper only jobs are enough.

For example :

case 1:

if you are reading the CSV and loading all words in hbase table with respective columns i.e with out aggregations like wordcount. then mapper only jobs are enough.

case 2:

if you are reading csv and want to do aggregations like word count ,( which is aggregation) then load it in hbase then you need reducer.

Hope that clarifies..

Unable to run hadoop application due to NoClassDefFoundError
Hadoop on Windows - "Error JAVA_HOME is incorrectly set."
Ports are not available: listen tcp 0.0.0.0/50070: bind: An attempt was made to access a socket in a way forbidden by its access permissions
Confusion between Operational and Analytical Big Data and on which category Hadoop operates?
Any command to get active namenode for nameservice in hadoop?
Datanode process not running in Hadoop
Python read file as stream from HDFS
What is the purpose of "uber mode" in hadoop?
Change block size of dfs file
Map Reduce Job Failing with OOM [org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster]
Unable to access Hadoop CLI after enabling Kerberos
How to check if Hadoop daemons are running?
hive -e with delimiter
Does mapreduce program consumes all the files (input dataset) in a folder by default?
Upgrading hadoop to 3.1.2 with hbase-testing-utility 2.2.3
How to understand the result of yarn queue status
Spark: what options can be passed with DataFrame.saveAsTable or DataFrameWriter.options?
Ambari 2.0 installation fails, "<urlopen error [Errno 111] Connection refused>"
Getting java.lang.UnsatisfiedLinkError when trying to run my Code
Hadoop HDFS - Difference between Missing replica and Under replicated blocks
Datanode having trouble with JVM pausing
Apache Crunch Job On AWS EMR using Oozie
How to turn off INFO logging in Spark?
run hadoop ERROR: JAVA_HOME /usr/bin/java does not exist
Hadoop start-all.cmd command : datanode shutting down
MacOS Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hadoop - namenode is not starting up
how t restore a hdfs deleted file
Sqoop Import HBase - SQL Database
Spark Streaming - Refresh Static Data