I'm using BaseTokenStreamTestCase to perform some tests... against a custom TokenFilter.
The test is failing in an inexplicable way.
You can see from my debug output, that the token it's complaining about, has an endOffset of 17
...
inconsistent endOffset 1 pos=1 posLen=1 token=hello expected:<11> but was:<17>
original: wheel chair hello there foo bar
increment: 1 1 1 1
tokens: wheel chair hello there foo bar
positions: ----------- ----- ----- -------
lengths: 2 1 1 2
sequence: 1 2 3 4
0123456789012345678901234567890
10 20 30
start-end: 1:[0-11], 2:[12-17], 3:[18-23], 4:[24-31]
Heres the test code:
assertAnalyzesTo(analyzer, input,
new String[] {"wheel chair", "hello", "there", "foo bar"},
new int[] {0, 12, 18, 24}, // start offsets
new int[] {11, 17, 23, 31}, // end offsets
null, // types
new int[] {1, 1, 1, 1}, // positionIncrement
new int[] {2, 1, 1, 2}); // positionLength
Why does it think the 2nd token should end at 11
?
BaseTokenStreamTestCase is generating the error from this source: ... near line 248
final int endPos = pos + posLength;
if (!posToEndOffset.containsKey(endPos)) {
// First time we've seen a token arriving to this position:
posToEndOffset.put(endPos, endOffset);
//System.out.println(" + e " + endPos + " -> " + endOffset);
} else {
// We've seen a token arriving to this position
// before; verify the endOffset is the same:
//System.out.println(" + ve " + endPos + " -> " + endOffset);
assertEquals("inconsistent endOffset " + i + " pos=" + pos + " posLen=" + posLength + " token=" + termAtt, posToEndOffset.get(endPos).intValue(), endOffset);
}
Because endPos is calculated to be pos + posLength
the test assumes that
posToEndOffset.get(endPos)
will return the end position offset of the current token position + length.
This means that its reading-ahead 1 token, because the first token has a length = 2.
This is why the test is failing. Length is being used improperly.
Leaving the length attribute set to its default of 1 corrected the test errors.