Search code examples
treehbaseprefixsubtree

Hbase and distributed prefix tree


I would like to represent and store a huge list of sequences as a prefix tree in many machines as follows: +A master machine will represent prefixes of sequences. +n slaves machines will represent n sub-prefix trees with each contains the rest of sequences.

I wonder if I can use Hbase for solving my problem? Could you share me any experience about that?


Solution

  • Maybe your concept of "Master" is not precisely the same as HMaster. The HMaster is for administrative purposes, such as identifying which RegionServer contains the Region for a given set of rows of a give table.

    All of the data in the rows of any of your tables will be inside RegionServer's. Reading/writing data to/from the RegionServers is accomplished without any interaction with the HMaster except to determine which server the rows live on.

    Coming back to your "master" vs "slaves" machines topology: you might decide to store the sequence prefixes as separate tables. Then the RegionServers for the prefixes may be managed separately from the sub-prefix trees . In any case there is not a "single master machine" to store the data but instead one or more Regions on one or more RegionServers.