Search code examples
javaaccumulo

Accumulo Range - End Key Not Inclusive


I am learning Accumulo and cannot seem to get the end key specified in a Range to be inclusive. My code is below. I have tried explicitly setting the endKeyInclusive to true in Range, but that didn't help.

BatchWriter writer = conn.createBatchWriter("table", config);

List<String> deterTimes = new ArrayList<>();

String rowId = "3015551212<ll>";
String columnFamily = "deter";
for (int i = 0; i < 10; i++) {
    String deterTime = "20181112:21:46:33" + i;
    deterTimes.add(deterTime);
    writer.addMutation(makeRecord(rowId, columnFamily, deterTime, "DETER" + i));                   
}

writer.flush();
writer.close();

Scanner scan = conn.createScanner("table", auths);

Key startKey = new Key(rowId.getBytes(), columnFamily.getBytes(), deterTimes.get(1).getBytes());
Key endKey = new Key(rowId.getBytes(), columnFamily.getBytes(), deterTimes.get(4).getBytes());
Range range = new Range(startKey, endKey);
if (range.isEndKeyInclusive())  System.out.println("true");
scan.setRange(range);

for (Entry<Key,Value> entry : scan) {
    Text row = entry.getKey().getRow();
    Text cq = entry.getKey().getColumnQualifier();
    Value value = entry.getValue();
    System.out.println("Fetched row " + row + " with value: " + value + ", cq=" + cq);
}

Output:

true
Fetched row 3015551212<ll> with value: DETER1, cq='20181112:21:46:331'
Fetched row 3015551212<ll> with value: DETER2, cq='20181112:21:46:332'
Fetched row 3015551212<ll> with value: DETER3, cq='20181112:21:46:333'

Solution

  • You are constructing your end key with ( row, column family, column qualifier ) as byte arrays, and the remaining dimensions of the key ( column visibility, timestamp ) set to default values (specifically, an empty byte array and Long.MAX_VALUE, respectively).

    The scanner will stop at that exact key, inclusively. However, your actual data entry is almost certainly not that exact key (you didn't provide your implementation of makeRecord to verify). Even if your data actually has an empty column visibility, the timestamp is almost certainly not Long.MAX_VALUE, but rather something you set in your makeRecord implementation or it was set based on the tserver's time or some table logical counter. Since the timestamp dimension of the key is ordered descending, your scanner will stop looking for data at Long.MAX_LONG before it reaches your entries.

    This is a bit like searching a dictionary for analogy, but stopping when you reach analog: you'll miss the remaining words that begin with analog.

    This is a common pitfall when constructing ranges based on exact keys. It is generally better to construct ranges based on rows (inclusive on rows will include the entire row), rather than keys (there is a Range constructor for that). Or, to specify the end key so that it works exclusively. You can do this by appending a null byte to the end of your last meaningful element of the column. For example, you can do something like:

    Key endKey = new Key(rowId.getBytes(),
                         columnFamily.getBytes(),
                         (deterTimes.get(4) + "\0").getBytes());
    Range range = new Range(startKey, true, endKey, false);
    

    Another pitfall you should be careful of is using String.getBytes() to get your byte arrays, without specifying an encoding. It would be better to use something consistent, like "abc".getBytes(StandardCharsets.UTF_8) (I usually do a static import, though, so I can specify only UTF_8).