Search code examples
hbase

Confusion about Hbase's "region"


From Hbase book I noticed that there is a important conception named "region".

Such as :

Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed even though the amount of data they carry is small

Around 50-100 regions is a good number for a table with 1 or 2 column families. Remember that a region is a contiguous segment of a column family

It seems that one "region" is belong to one or more colume family?

I confused about what is "region" exactly


Solution

  • If you check how HBase data is stored on hdfs:

    user@host:~$ hdfs dfs -du -h /hbase/data/default/traffic
    
    284       /hbase/data/default/traffic/.tabledesc
    0         /hbase/data/default/traffic/.tmp
    382.8 M   /hbase/data/default/traffic/08ec69a079692f404c8d2949066f569b
    124.1 M   /hbase/data/default/traffic/0d986ba711e8dee5458090f98cccd446
    110.9 M   /hbase/data/default/traffic/0ea846c84192e3a744a4de907895351e
    271.0 M   /hbase/data/default/traffic/0f1682446b5331bdebbdee64b5a20c4f
    198.3 M   /hbase/data/default/traffic/0f349f966564ae0e87e927cc079aec86
    ...
    

    you will see that there are many folders with hashed names - each folder contains region data.

    Inside each region you will see folders that are grouped by column families:

    user@host:~$ hdfs dfs -du -h /hbase/data/default/traffic/f51ec9f3170e9abaf44537e96ebf8560
    
    163      /hbase/data/default/traffic/f51ec9f3170e9abaf44537e96ebf8560/.regioninfo
    243.8 M  /hbase/data/default/traffic/f51ec9f3170e9abaf44537e96ebf8560/r
    124.2 M  /hbase/data/default/traffic/f51ec9f3170e9abaf44537e96ebf8560/z
    

    In my case I have two column families with names r and z. Inside column families folders you will find hfiles.

    Answering your question: region is a part of the table with a specific diapason of keys. It contains all the column families of the table. If you edit table schema and add new column family, all regions for this table will be updated.