Search code examples
hadoopmapreducehbase

Can I use SingleColumnValueFilter on rowkey in HBase?


HBase version: 1.2.2 (both server and Java API)

public SingleColumnValueFilter(byte[] family,
                               byte[] qualifier,
                               CompareFilter.CompareOp compareOp,
                               ByteArrayComparable comparator)

I am using org.apache.hadoop.hbase.filter.RegexStringComparator to perform LIKE query on rowkey.

It's working fine with columns.

But it gives me all the records if I use rowkey instead of a column.


Solution

  • ColumnValue Filters and RowFilters are different.

    Value Fileters operates on Column values(possibility of full table scan) where as RowFilters work on rowkey.


    SingleColumnValueFilter :

    This filter takes a column family, a qualifier, a compare operator and a comparator. If the specified column is not found – all the columns of that row will be emitted. If the column is found and the comparison with the comparator returns true, all the columns of the row will be emitted. If the condition fails, the row will not be emitted.

    This filter also takes two additional optional boolean arguments – filterIfColumnMissing and setLatestVersionOnly

    If the filterIfColumnMissing flag is set to true the columns of the row will not be emitted if the specified column to check is not found in the row. The default value is false.

    If the setLatestVersionOnly flag is set to false, it will test previous versions (timestamps) too. The default value is true.

    These flags are optional and if you must set neither or both.

    Syntax

    SingleColumnValueFilter(‘’,‘’, , ‘’, , )

        SingleColumnValueFilter(‘<family>’, ‘<qualifier>, <compare operator>, ‘<comparator>’)
    

    Example :

    hbase(main):020:0> scan 'airline' ,{ FILTER => "SingleColumnValueFilter('flightbetween','source',=, 'binary:Delhi')" }
    

    If you want to perform "LIKE" query on rowkey

    you can use Prefix filter/FuzzyRowFilter(more advanced)

    prefixfilter: This filter takes one argument a prefix of a row key. It returns only those key-values present in a row that starts with the specified row prefix