we can count all rows, using hbase shell with this command : count 'table_name', INTERVAL=> 1
or just simple count 'table_name
.
But How to do this using Scala Programming ?
Although I have done with java client for Hbase, I researched and found out the below.. Java way code snippet :
You can use KeyOnlyFilter() to get only Keys of the rows. and then loop like below..
for (Result rs = scanner.next(); rs != null; rs = scanner.next()) {
number++;
}
like above you can use the below scala hbase example..
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{HBaseAdmin,HTable,Put,Get}
import org.apache.hadoop.hbase.util.Bytes
val conf = new HBaseConfiguration()
val admin = new HBaseAdmin(conf)
// list the tables
val listtables=admin.listTables()
listtables.foreach(println)
// let's insert some data in 'mytable' and get the row
val table = new HTable(conf, "mytable")
val theput= new Put(Bytes.toBytes("rowkey1"))
theput.add(Bytes.toBytes("ids"),Bytes.toBytes("id1"),Bytes.toBytes("one"))
table.put(theput)
val theget= new Get(Bytes.toBytes("rowkey1"))
val result=table.get(theget)
val value=result.value()
println(Bytes.toString(value))
However as an additional information(and best way than java or scala) please see below
RowCounter
is a mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase
can read all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit.
$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>
Usage: RowCounter [options]
<tablename> [
--starttime=[start]
--endtime=[end]
[--range=[startKey],[endKey]]
[<column1> <column2>...]
]