Search code examples
scalahadoophbasenosql-aggregationnosql

How to count all rows on Hbase table using Scala


we can count all rows, using hbase shell with this command : count 'table_name', INTERVAL=> 1 or just simple count 'table_name.

But How to do this using Scala Programming ?


Solution

  • Although I have done with java client for Hbase, I researched and found out the below.. Java way code snippet :

    You can use KeyOnlyFilter() to get only Keys of the rows. and then loop like below..

       for (Result rs = scanner.next(); rs != null; rs = scanner.next()) {
            number++;
        }
    

    like above you can use the below scala hbase example..

    Please look at the Java API. Adaptation to scala should be relatively easy. The example below shows part of the sample Java code adapted to scala:

    import org.apache.hadoop.hbase.HBaseConfiguration
    import org.apache.hadoop.hbase.client.{HBaseAdmin,HTable,Put,Get}
    import org.apache.hadoop.hbase.util.Bytes
    
    
    val conf = new HBaseConfiguration()
    val admin = new HBaseAdmin(conf)
    
    // list the tables
    val listtables=admin.listTables() 
    listtables.foreach(println)
    
    // let's insert some data in 'mytable' and get the row
    
    val table = new HTable(conf, "mytable")
    
    val theput= new Put(Bytes.toBytes("rowkey1"))
    
    theput.add(Bytes.toBytes("ids"),Bytes.toBytes("id1"),Bytes.toBytes("one"))
    table.put(theput)
    
    val theget= new Get(Bytes.toBytes("rowkey1"))
    val result=table.get(theget)
    val value=result.value()
    println(Bytes.toString(value))
    

    However as an additional information(and best way than java or scala) please see below

    RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit.

    $ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>
    
    Usage: RowCounter [options] 
        <tablename> [          
            --starttime=[start] 
            --endtime=[end] 
            [--range=[startKey],[endKey]] 
            [<column1> <column2>...]
        ]