Search code examples
apache-sparkhbase

ColumnPrefixFilter for Hbase scan


I have an Hbase table with column family's and qualifiers listed below :

maindata
content:master:909
content:master:899
content:master:97832
content:master:9902222
content:master:9
.
.
.
content:master:223343453

I need to add a filter to Hbase Scan object in my Spark Java class to fetch only the content:master:anynumber columns. Something like :

    Scan scan = new Scan();
    scan.addColumn(Bytes.toBytes("content"),Bytes.toBytes("master:[ *[^0-9]. *]"));

So that I only get content:master:anynumber columns . Does Scan support such pattern?


Solution

  •   byte[][] prefixes = new byte[][] {Bytes.toBytes("master:")};
      Filter filter = new MultipleColumnPrefixFilter(prefixes);
    
      Scan scan = new Scan();
      scan.addFamily(Bytes.toBytes("content"));
      scan.setFilter(filter);
    

    Thanks to Hbase MultipleColumnPrefixFilter.