Search code examples
c#lucenelucene.netluke

Luke Lucene BooleanQuery


In Luke, the following search expression returns 23 results:

docurl:www.siteurl.com  docfile:Tomatoes*

If I pass this same expression into my C# Lucene.NET app with the following implementation:

        IndexReader reader = IndexReader.Open(indexName);
        Searcher searcher = new IndexSearcher(reader);
        try
        {
            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max)
            ...
        }

I get 0 results

Luke is using StandardAnalyzer and this is what the Explain Structure window looks like: Luke Query Structure

Must I manually create BooleanClause objects for each field I search on, specifying Should for each one then add them to the BooleanQuery object with .Add()? I thought the QueryParser would do this for me. What am I missing?

Edit: Simplifying a tad, docfile:Tomatoes* returns 23 docs in Luke, yet 0 in my app. Per Gene's suggestion, I've changed from MUST to SHOULD:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max);

parsedQuery is simply docfile:tomatoes*

Edit2:

I think I've finally gotten to the root problem:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            Query parsedQuery = parser.Parse(query);

In the second line, query is "docfile:Tomatoes*", but parsedQuery is {docfile:tomatoes*}. Notice the difference? Lower case 't' in the parsed query. I never noticed this before. If I change the value in the IDE to 'T', 23 results return.

I've verified that StandardAnalyzer is being used when indexing and reading the index. How do I force queryParser to keep the case of the value of query?

Edit3: Wow, how frustrating. According to the documentation, I can accomplish this with:

parser.setLowercaseExpandedTerms(false);

Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically lower-cased or not. Default is true.

I won't argue whether that's a sensible default or not. I suppose SimpleAnalyzer should have been used to lowercase everything in and out of the index. The frustrating part is, at least with the version I'm using, Luke defaults the other way! At least I learned a bit more about Lucene.


Solution

  • QueryParser will indeed take a query like "docurl:www.siteurl.com docfile:Tomatoes*" and build a proper query out of it (boolean query, range query, etc.) depending on the query given (see query syntax).

    Your first step should be to attach a debugger and inspect the value and type of parsedQuery.