Search code examples
hbase

How we distribute addresses evenly on all region servers


Currently we have a row key which is: MAC+REVERSED_TS

Issue here is that it all ongoing mac addr which are questioned by DHCP are in just a single region server however hbase has 3 nodes.

Want to know how we equitably disperse these mac addr on all region servers and confront from having the ongoing ones just on one region server

I see Salting seems to be the strategy, do anyone have a solution for this ?


Solution

  • Salting works as follows: whenever you create a row key you manually add a random number between 0-9 to the front (for example) 2+MAC+REVERSED_TS.

    Then you make sure your HBase table is split on these numbers:

    create 'mac-records','a', SPLITS=> ['1', '2', '3', '4', '5', '6', '7', '8', '9']
    

    You'll want to change the splits based on the size of your cluster (if you only have three nodes you'd have three splits, for example).

    The downside is that whenever you want to retrieve a MAC address you have to run ten queries because you don't know which bucket it resides in. This shouldn't affect performance if you run all ten at the same time. For example, you'd run:

    scan 'mac-records', { FILTER => "PrefixFilter('0+MAC')"}
    scan 'mac-records', { FILTER => "PrefixFilter('1+MAC')"}
    scan 'mac-records', { FILTER => "PrefixFilter('2+MAC')"}
    ...
    scan 'mac-records', { FILTER => "PrefixFilter('9+MAC')"}
    

    This prevents hotspotting as data will be written to all of your region servers.