Search code examples
hbaseperformance-testingrandom-access

How to increase the performance of random get operation with huge(10million records) small(240Bytes in avg) record size in Hbase?


I have a Hbase table with four column families(totally 10 columns), the primary key is a fixed 10bytes id.The average row size is 240Bytes.

When I test the random get operation in HBase with 1 million rows, it gets 1000+rows/s, 0.25MB/s in average.

But when I test the same operation with 10million rows, it gets 160row/s, 0.04MB/s. After read some materials, I increased the HBASE_HEAPSIZE from 1G to 5G, after that I got 320rows/s, 0.08MB/s(cache hit raito is 87%), but it still much less than the speed in 1 million testsuite.

Does it has any methods to increase the performance? Thanks.


Solution

  • For random gets:

    • decrease the block size, not more than 64kb, 32k should be good
    • add a bloom filter on your table, at the row level
    • split your table in multiple regions by setting a low region file max to 1Go or lower and presplit your table (by country, merchants, or whatever you want)
    • activate the in memory
    • use a fast compression codec (lzo or snappy are good)
    • use a tablepool on your client side
    • use memcache (...)

    Enjoy ;)