Search code examples
hadoophbase

How data read happens in HBase?


We know HBase is deployed on top of Hadoop and HDFS. Also, we know when we want to read a file(or record) from HDFS, it takes a considerable amount of time using HDFS CLI.

But even HBase uses HDFS, it's capable to read a key within a couple of milliseconds. How does this happen?


Solution

  • I think the reason includes:

    1. Data is split to different Region Servers. Client side can get the Region Server from META table, and communicate with HBase Region Servers directly.
    2. Region Servers are collocated with the HDFS DataNodes, which enable data locality (putting the data close to where it is needed) for the data served by the Region Servers.
    3. An HFile contains a multi-layered index which allows HBase to seek to the data without having to read the whole file.
    4. HBase read from BlockCache and MemStore first, if the data can be found in BlockCache or MemStore, HBase don't need to read HFiles from HDFS.