I'm using HBase CDH3, and I'm designing my HBase table. Let's say all my rowkeys are hashed, and I have 2 column families colFamA and colFamB. For each row, there will be values stored in either colFamA or colFamB, but not both.
If I set up a scanner to scan over every row, and I specify in my scanner
Scan scan = new Scan();
scan.addFamily(Bytes.toBytes("colFamA");
hTable.getScanner(scan);
so I only want colFamA values, and not colFamB values, will my scanner still have to scan over rows that contain no data for colFamA (i.e. rows with only colFamB values)? Will the fact that there is colFamB slow down this scan even though I'm not adding it as a column to be returned in my scan?
One word answer is NO.
Slightly longer answer is: HBase does not process unneeded families during scanning at all. Every family is actually stored into different storage so it is obvious there is no need to search something into not specified family. If no family is specified, all families are scanned.
Even more detailed explanation: at lease AFAIK for HBase 0.96 I see there is RegionScanner
interface and RegionScannerImpl
class which is member of HRegion
. This scanner constructor checks if families are specified into your Scan
object and additional scanners list is determined based on families array (per store).