Search code examples
luceneanalyzerquery-parser

Lucene QueryParser Analyzer inconsistency


I have a very simple Analyzer that tries to replace forward slashes (/) by spaces. Because QueryParser forces me to escape strings with slashes before parsing, I added a MappingCharFilter to the analyzer that replaces "\/" with a single space. The analyzer is defined as follows:

@Override
protected TokenStreamComponents createComponents(String field, Reader in) {
    NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
    builder.add("\\/", " ");
    Reader mappingFilter = new MappingCharFilter(builder.build(), in);

    Tokenizer tokenizer = new WhitespaceTokenizer(version, mappingFilter);
    return new TokenStreamComponents(tokenizer);
}

Then I use this analyzer in the QueryParser to parse a string with dashes:

String text = QueryParser.escape("one/two");
QueryParser parser = new QueryParser(Version.LUCENE_48, "f", new MyAnalyzer(Version.LUCENE_48));
System.err.println(parser.parse(text));

The expected output would be

f:one f:two

However, I get:

f:one/two

The puzzling thing is that when I debug the analyzer, it tokenizes the input string correctly, returning two tokens instead of one.

What is going on?

Thanks.


Solution

  • A very simple fix. Don't escape the front slash character in the first argument of the builder.add method.

    builder.add("/", " ");