Search code examples
hbase

Strategy to create splits for Hbase Table.


   Can any one suggest me strategies while splitting HBASE table. I have data range starts from [a-z]. 

I have splitted it in as {"e","j","o","u"}. Will it be efficient way.


Solution

  • How are you data splitted ?

    When you split a table you need to avoid hotspotting. Which can be for example prevented with salting. If your rows are evenly distributed, then your split is fine.

    But if you don't control your data it is better to apply salting to your rows keys.

    I will give you the example from the HBase documentation

    If you have some rows like this :

    foo001
    foo002
    foo003
    foo004
    

    Then all your rows will go in the same split, which will cause hotspotting.

    That's were salting is important if you add a random string at the beginning of your row, for example "e", "j", "o" or "u" then you will distribute your data evenly

    e-foo002
    u-foo003
    

    And your data will be more evenly distributed. You can apply a random or determinist salting, it's up to you, but a determinist one is better, cause random can cause tricky stuff with salting.

    As a (very) quick conclusion : if you have evenly distributed data, your splits are fine, else it would be better to apply salting

    EDIT : Might be a good idea to explain this in the documentation on SO.