Search code examples
javasearchindexinglucene

Lucene: How to search by specific term


I'm trying to do a Lucene search by a specific string term.
Eg: I had the tags 1-"Hello World", 2-"Hello, Steve", 3-"Helloween" and finally 4-"Hello" if I look for the last tag (hello), Lucene will bring all tags, because all of them had "hello" at some point. I need an operator or a logic that makes the search without "like".

There is a way to avoid this using the clause "must_not" (- operator) and the query will be: term:hello -term:world. But this is not the case, cause I will need to find all other words that should not be in search.

private <T> Query createQuery(final Class<T> clazz, String s, final String[] fields, final SearchFactory searchFactory, final Boolean allowLeadingWildcard) throws ParseException {
    final Analyzer analyzer = searchFactory.getAnalyzer(clazz);
    final QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_36, fields, analyzer);
    Query query = null;
    try{
        query = parser.parse(s);
    } catch(...){...}
    return query;

My knowledge of Lucene is short, so I will place an SQL example to see if will be easier to understand

/*This is what Lucene is doing. It will bring "HELLO", "HELLO WORLD", "Hello, Steve"...*/
WHERE table.tag LIKE "%HELLO%" 
/*This is what I want. Match exactly the term "HELLO" and nothing more*/
WHERE table.tag = "HELLO" 

I guess that this is the Analyzer used in the application:

public class AnalyserCustom extends Analyzer {

    @Override
    public TokenStream tokenStream(final String fieldName, final Reader reader) {
        final StandardTokenizer tokenizer = new StandardTokenizer(Version.LUCENE_36, reader);

        TokenStream stream = new StandardFilter(Version.LUCENE_36, tokenizer);
        stream = new LowerCaseFilter(Version.LUCENE_36, stream);
        return new ASCIIFoldingFilter(stream);
    }
}

And the attribute TAG is this:

...
@Field
private String tagname;
...

Any suggestions?
PS: I'm new to Lucene.


Solution

  • You have to use to index the field, that will generate one specific token for the searched string, try with KeywordAnalyzer.