Search code examples
hbasecolumn-family

HBase Column family locality


there are HBase on five servers with one Table that contains one column Family and I should do some map tasks on it per each key and save the result. the main question is:

to keep data locality which one is better: create new Column Family on the existence Table or create new Table?

and the Next question:

HBase Documentation suggests keeping lower than three Column Family, and as I told I have more than ten map tasks and would to keep each result in new Column Family.what shall I do? because each map tasks are different from the other one. the locality preserving and search cost are important.


Solution

  • which one is better: create new Column Family on the existence Table or create new Table

    I would recommend to care more about schema and simplicity of table design, rather than trying to hack HBase internals to get the best performance. If information from these 2 column families is related and you need to access both CFs in map-reduce scans - keep them in same table. If information is 100% independent and you will never need to scan them simultaneously - keep them in different tables. Again, it's a schema design question, don't try to perform premature optimisations.

    Second question - I did not understand what you're asing, sorry.