Search code examples
javahbase

Comibine two FilterLists with MUST_PASS_ONE/ALL operator in a single Scan


Consider hbase shell scan 'table' results:

ROW COLUMN+CELL
000 column=F:Q, timestamp=1519299345645, value=a
001 column=F:Q, timestamp=1519299345645, value=b
010 column=F:Q, timestamp=1519299345645, value=c
011 column=F:Q, timestamp=1519299345645, value=b
100 column=F:Q, timestamp=1519299345645, value=a
110 column=F:Q, timestamp=1519299345645, value=c
200 column=F:Q, timestamp=1519299345645, value=b
210 column=F:Q, timestamp=1519299345645, value=a

What I want as my scan result:

  • Row key starts with 0 or 1 and
  • Column F:Q value is a or b

Which for the example above is:

ROW COLUMN+CELL
000 column=F:Q, timestamp=1519299345645, value=a
001 column=F:Q, timestamp=1519299345645, value=b
011 column=F:Q, timestamp=1519299345645, value=b
100 column=F:Q, timestamp=1519299345645, value=a

In hbase shell, it would be (ignore all \s and \n which I put for better readability):

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.util.Bytes

scan 'table' { 
  COLUMNS => 'F:Q', 
  FILTER => "
    (
      (PrefixFilter('0')) 
      OR 
      (PrefixFilter('1'))
    ) 
    AND 
    (
      SingleColumnValuFilter(
         Bytes.toBytes("F"),
         Bytes.toBytes("Q"),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         Bytes.toBytes("a")
      )
      OR 
      SingleColumnValuFilter(
         Bytes.toBytes("F"),
         Bytes.toBytes("Q"),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         Bytes.toBytes("b")
      )
    )
  "
}

So consider that I have two filter lists in java:

List<Filter> prefixFilters            = new ArrayList<>();
List<Filter> singleColumnValueFilters = new ArrayList();

PrefixFilter one  = new PrefixFilter(Bytes.toBytes("1"));
PrefixFilter zero = new PrefixFilter(Bytes.toBytes("0"));

SingleColumnValueFilter a = new SingleColumnValueFilter(
    Bytes.toBytes("F"),
    Bytes.toBytes("Q"),
    CompareFilter.CompareOp.EQUAL,
    Bytes.toBytes("a") 
);

SingleColumnValueFilter b = new SingleColumnValueFilter(
    Bytes.toBytes("F"),
    Bytes.toBytes("Q"),
    CompareFilter.CompareOp.EQUAL,
    Bytes.toBytes("b") 
);

prefixFilters.add(zero);
prefixFilters.add(one);

singleColumnValueFilters.add(a);
singleColumnValueFilters.add(b);

FilterList prefixFiltersList = new FitlerList(FilterList.Operator.MUST_PASS_ONE, prefixFilters);
FilterList singleColumnValueFiltersList = new FitlerList(FilterList.Operator.MUST_PASS_ONE, singleColumnValueFilters);

Question: How can I combine them for a single scan.setFilter() with an AND operator, as I did in the shell?


I expected to have special FilterList constructor for that, which would accept logical comparator (AND / OR) and multiple List<Filter> arguments. Since there's none, I'm stuck.


Solution

  • At the end, add

    FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL);
    filters.addFilter(prefixFiltersList);
    filters.addFilter(singleColumnValueFiltersList);
    
    scan.setFilter(filters);
    

    This ensures that both FilterLists are run, and MUST_PASS_ALL acts as an AND condition.

    Why does this work? As per the FilterList JavaDoc:

    Since you can use Filter Lists as children of Filter Lists, you can create a hierarchy of filters to be evaluated.