Search code examples
hbasejrubybigdatadatabasenosql

Count number of records in a column family in an HBase table


I'm looking for an HBase shell command that will count the number of records in a specified column family. I know I can run:

echo "scan 'table_name'" | hbase shell | grep column_family_name | wc -l  

however this will run much slower than the standard counting command:

count 'table_name' , CACHE => 50000 (because the use of the CACHE=>50000)  

and worse - it doesn't return the real number of records, but something like the total number of cells (if I'm not mistaken?) in the specified column family. I need something of the sort:

count 'table_name' , CACHE => 50000 , {COLUMNS => 'column_family_name'}

Thanks in advance,
Michael


Solution

  • Here is Ruby code I have written when needed thing like you need. Appropriate comments are provided. It provides you with HBase shell count_table command. First parameter is table name and second is array of properties, the same as for scan shell command.

    Direct answer to your question is

    count_table 'your.table', { COLUMNS => 'your.family' }
    

    I also recommend to add cache, like for scan:

    count_table 'your.table', { COLUMNS => 'your.family', CACHE => 10000 }
    

    And here you go with sources:

    # Argiments are the same as for scan command.
    # Examples:
    #
    # count_table 'test.table', { COLUMNS => 'f:c1' }
    # --- Counts f:c1 columsn in 'test_table'.
    #
    # count_table 'other.table', { COLUMNS => 'f' }
    # --- Counts 'f' family rows in 'other.table'.
    #
    # count_table 'test.table', { CACHE => 1000 }
    # --- Count rows with caching.
    #
    def count_table(tablename, args = {})
    
        table = @shell.hbase_table(tablename)
    
        # Run the scanner
        scanner = table._get_scanner(args)
    
        count = 0
        iter = scanner.iterator
    
        # Iterate results
        while iter.hasNext
            row = iter.next
            count += 1
        end
    
        # Return the counter
        return count
    end