Search code examples
hbase

HBase count shell command


I understand shell command count will give the count/number of rows in table. But what INTERVAL and CACHE denoted here?. I checked the web. Almost all the websites have the same explanation as

"Current count is shown every 1000 rows by default. Count interval may be optionally specified. Scan caching is enabled on count scans by default. Default cache size is 10 rows. If our rows are small in size, you may want to increase this parameter. Examples:"

I do not understand what they are explaining.

hbase> COUNT 't1', INTERVAL => 100000
hbase> COUNT 't1', CACHE => 1000
hbase> COUNT 't1', INTERVAL => 10, CACHE => 1000

Can anybody explain in easy way?


Solution

  • You can just use a large table(more than 2000 rows) to run the count command, and you can see how they work.

    As count operation may take a LONG time, so it will print the current result on and on, like this:

    Current count: 1000, row: ...                                                                                          
    Current count: 2000, row: .....                                                                                                     
    Current count: 3000, row:  ....
    

    So if the INTERVAL is 1000, it will print when ever the count process get 1000.

    And Cache is just cache of scan command. Basically, the count process will be faster if increasing the cache config, but will cost more memory, so it says:

    If your rows are small in size, you may want to increase this parameter.