Search code examples
hadoopapache-sparkhbase

Hbase number of regions keep growing


We are using hbase version 1.1.4. The DB has a around 40 tables, and each table data has a TimeToLive specified. It is deployed on a 5 node cluster, and the following is the hbase-site.xml

<property>
<name>phoenix.query.threadPoolSize</name>
<value>2048</value>
</property>

<property>
<name>hbase.hregion.max.filesize</name>
<value>21474836480</value>
</property>

<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>4</value>
</property>
<!-- default is 64MB 67108864 -->
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>536870912</value>
</property>
<!-- default is 7, should be at least 2x compactionThreshold -->
<property>
<name>hbase.hstore.blockingStoreFiles</name>
<value>240</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>

<property>
<name>hbase.bucketcache.ioengine</name>
<value>offheap</value>
</property>
<property>
<name>hbase.bucketcache.size</name>
<value>40960</value>
</property>

Question is that the number of regions on each of the regionservers keep growing. Currently we only merge regions using

merge_region in the hbase shell.

Is there any way to have only a fixed number of regions, on each server, or an automated way to merge the regions?


Solution

  • Well it mostly depends on your data: how is it distributed across keys. Assuming your values have almost same size for all keys, you can use partitioning:

    For example, if your table key is String and you want 100 regions, use this

    public static byte[] hashKey(String key) {
        int partition = Math.abs(key.hashCode() % 100);
        String prefix = partitionPrefix(partition);
        return Bytes.add(Bytes.toBytes(prefix), ZERO_BYTE, key);
    }
    
    public static String partitionPrefix(int partition) {
        return StringUtils.leftPad(String.valueOf(partition), 2, '0');
    }
    

    In this case, all you keys will be prepended with numbers 00-99, so you have 100 partitions for 100 regions. Now you can disable region splits:

    HTableDescriptor td = new HTableDescriptor(TableName.valueOf("myTable"));
    td.setRegionSplitPolicyClassName("org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy");
    

    or via shell

    alter 'myTable', {TABLE_ATTRIBUTES => {METADATA => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}}