Search code examples
hbaseapache-pig

scan hbase table for row where column field value is empty


I would like to perform a quick filtered scan in the hbase shell to retrieve any records where a specific column value is null or empty. I was told I can do this in PIG as well but don't know how to do that either. Can someone please assist with either method to get this resolved. I will as last option do in java but just thought this would be faster?

 **TableName, columnFamily, column**
 test:table     collected:url

Thanks.

EDIT:

Tried the following:

  scan 'test:table',{ COLUMNS => 'collected:url', LIMIT => 10, FILTER => "ValueFilter( =, 'binaryprefix:<>' )"}

No results so maybe it works but does this look correct?

EDIT 2:

Error returned:

     hbase(main):008:0> scan 'test:table', {FILTER => 
    SingleColumnValueFilter.new(Bytes.toBytes('collected'), 
    Bytes.toBytes('url'), CompareFilter::CompareOp.valueOf('EQUAL'), 
    Bytes.toBytes(''), true, true), COLUMNS => 'collected:url' }
    ArgumentError: wrong number of arguments (6 for 4)

Solution

  • You don't really need pig for solving your problem.

    In Hbase shell SingleColumnValueFilter should help you. You need to set filterIfMissing to false, what is actually the default value. So your scan should look like:

    import org.apache.hadoop.hbase.filter.CompareFilter
    import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
    import org.apache.hadoop.hbase.filter.SubstringComparator
    import org.apache.hadoop.hbase.util.Bytes
    
    scan 'YourTable', {FILTER => SingleColumnValueFilter.new(Bytes.toBytes('family'), Bytes.toBytes('field'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('')), COLUMNS => 'family:field' }