Search code examples
hbasedatabase-scanstargate

HBase Stargate Scanner - startRow and endRow not working


I'm having some major issues with trying to scan a table using HBase Stargate. My HBase row schema is basically objectidnumber_languagecode_date_randomhash, ie.

1_en_2014-12-12_1432ae341
1_en_2014-13-13_234fe321
2_en_2014-01-14_243a43fe
...
342342_uk_2014-01-14_2234af3

I want to scan the table for all items starting with an objectidnumber. I think the issue is that the objectidnumbers are serial and have a different number of digits, but I'm not totally sure.

When using HBase shell, the command I'm using is:

scan 'object_articles', { STARTROW => '33_', ENDROW => '34' }

This should give me every row that starts with 33_ and stop as soon as it hits 34, as the results indicate:

hbase(main):012:0> scan 'object_articles', { STARTROW => '33_', ENDROW => '34' }
ROW                                         COLUMN+CELL
 33_en_2004_zdfasdf                         column=cf:articleId, timestamp=1398803544834, value=en_2004_zdfasdf
 33_en_2004_zdfasdf                         column=cf:articleTitle, timestamp=1398803544834, value=Testing
 33_en_2004_zdfasdf                         column=cf:index, timestamp=1398803544834, value=en_2004
1 row(s) in 0.0120 seconds

However, when I set up my Stargate scanner with this simple XML:

<Scanner startRow="33_" endRow="34" />

It is giving me back every row in the entire table. Another behavior is that a 4-digit startRow/endRow yields a 204 No Content response, but any 3-digit startRow/endRow brings back the entire table.

All results:

<Scanner startRow="999_" endRow="1000" />

204 No Content:

I'm pretty perplexed as to why it seems Shell is working fine, however the Stargate XML isn't.


Solution

  • I suppose it was posting at 2AM, but this was really simple. I wasn't quite wrapping my head around lexicographic ordering.

    Since 99_ < 9_, my original idea wasn't going to work. I ended up adding a PrefixFilter for the startRow and getting rid of the endRow, that way it is only grabbing rows starting with the OOID:

    In Java:

        xml.append("<Scanner startRow=\"").append(startRow).append("\">");
    
        // Prefix Filter
        PrefixFilter test = new PrefixFilter(Bytes.toBytes(startRow));
        xml.append("<filter>").append(ScannerModel.stringifyFilter(test)).append("</filter>");
    
        xml.append("</Scanner>");
    

    How it looks with "99_" as the startRow:

    <Scanner startRow="99_">
        <filter>
            {"type":"PrefixFilter","value":"OTlf"}
        </filter>
    </Scanner>