How data read happens in HBase?

We know HBase is deployed on top of Hadoop and HDFS. Also, we know when we want to read a file(or record) from HDFS, it takes a considerable amount of time using HDFS CLI.

But even HBase uses HDFS, it's capable to read a key within a couple of milliseconds. How does this happen?

Solution

I think the reason includes:

Data is split to different Region Servers. Client side can get the Region Server from META table, and communicate with HBase Region Servers directly.
Region Servers are collocated with the HDFS DataNodes, which enable data locality (putting the data close to where it is needed) for the data served by the Region Servers.
An HFile contains a multi-layered index which allows HBase to seek to the data without having to read the whole file.
HBase read from BlockCache and MemStore first, if the data can be found in BlockCache or MemStore, HBase don't need to read HFiles from HDFS.

Unable to run hadoop application due to NoClassDefFoundError
Hadoop on Windows - "Error JAVA_HOME is incorrectly set."
Ports are not available: listen tcp 0.0.0.0/50070: bind: An attempt was made to access a socket in a way forbidden by its access permissions
Confusion between Operational and Analytical Big Data and on which category Hadoop operates?
Any command to get active namenode for nameservice in hadoop?
Datanode process not running in Hadoop
Python read file as stream from HDFS
What is the purpose of "uber mode" in hadoop?
Change block size of dfs file
Map Reduce Job Failing with OOM [org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster]
Unable to access Hadoop CLI after enabling Kerberos
How to check if Hadoop daemons are running?
hive -e with delimiter
Does mapreduce program consumes all the files (input dataset) in a folder by default?
Upgrading hadoop to 3.1.2 with hbase-testing-utility 2.2.3
How to understand the result of yarn queue status
Spark: what options can be passed with DataFrame.saveAsTable or DataFrameWriter.options?
Ambari 2.0 installation fails, "<urlopen error [Errno 111] Connection refused>"
Getting java.lang.UnsatisfiedLinkError when trying to run my Code
Hadoop HDFS - Difference between Missing replica and Under replicated blocks
Datanode having trouble with JVM pausing
Apache Crunch Job On AWS EMR using Oozie
How to turn off INFO logging in Spark?
run hadoop ERROR: JAVA_HOME /usr/bin/java does not exist
Hadoop start-all.cmd command : datanode shutting down
MacOS Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hadoop - namenode is not starting up
how t restore a hdfs deleted file
Sqoop Import HBase - SQL Database
Spark Streaming - Refresh Static Data