I've created a query with two MUST clauses (and zero SHOULD clauses) that is returning results that satisfy only one of the clauses. As far as I can tell, this is incorrect behavior.
An example of such a query before searching is
{+(Text:wba) +(Attribute:10)}
The incorrect results being returned have 'wba' as a term in the 'Text' field, but do not have '10' as a term in the 'Attribute' field.
When I look at my index in Luke, go to the Search tab, and run this search
+Text:wba +Attribute:10
I get no results, as I would expect.
Here's a slightly simplified version of the code to run the search:
public static ScoreDoc[] Search( string searchPhrase, int maxResults, IEnumerable<string> attributes ) {
var topQuery = new BooleanQuery();
var textQuery = new BooleanQuery();
using( var ngAnalyzer = new NGramAnalyzer( Version.LUCENE_30, 3, 9 ) ) {
using( var stAnalyzer = new StandardAnalyzer( Version.LUCENE_30, new HashSet<string>() ) ) {
var ngParser = new QueryParser( Version.LUCENE_30, IndexManager.TextFieldName, ngAnalyzer );
var stParser = new QueryParser( Version.LUCENE_30, IndexManager.TextFieldName, stAnalyzer );
var terms = AutoCompleter.QueryToTerms( searchPhrase );
foreach( var word in terms ) {
if( string.IsNullOrWhiteSpace( word ) ) {
continue;
}
if( word.Length < 3 ) {
textQuery.Add( stParser.Parse( word ), Occur.MUST );
} else {
var parsed = ngParser.Parse( word );
var extractedTerms = new HashSet<Term>();
parsed.ExtractTerms( extractedTerms );
foreach( var term in extractedTerms ) {
textQuery.Add( new TermQuery( term ), Occur.SHOULD );
}
}
}
}
}
topQuery.Add( textQuery, Occur.MUST );
if( attributes != null && attributes.Any() ) {
var attrQuery = new BooleanQuery();
foreach( var attr in attributes ) {
attrQuery.Add( new TermQuery( new Term( IndexManager.AttributeFieldName, attr ) ), Occur.SHOULD );
}
topQuery.Add( attrQuery, Occur.MUST );
}
// Actually conduct the search
var searcher = AutoCompleter.IndexManager.GetOrCreateSearcher( AutoCompleter.TableId );
var resultDocs = searcher.Search( textQuery, maxResults ).ScoreDocs;
return resultDocs;
}
Here's an excerpt from the code that produces the index:
// Add the new document
var doc = new Document();
var field = new Field( IndexManager.TextFieldName, term.Text, Field.Store.YES, Field.Index.ANALYZED );
doc.Add( field );
if( !String.IsNullOrWhiteSpace( term.Id ) ) {
field = new Field( IndexManager.IdFieldName, term.Id, Field.Store.YES, Field.Index.NO );
doc.Add( field );
}
foreach( var attr in term.Attributes ) {
if( !String.IsNullOrWhiteSpace( attr ) ) {
field = new Field( IndexManager.AttributeFieldName, attr, Field.Store.YES, Field.Index.NOT_ANALYZED );
doc.Add( field );
}
}
writer.AddDocument( doc );
So, to be clear, I'm expecting only results that match the text clause inside textQuery
and at least one of the attribute clauses held in attrQuery
. Why isn't this working the way I expect?
This line is wrong:
var resultDocs = searcher.Search( textQuery, maxResults ).ScoreDocs;
Should be:
var resultDocs = searcher.Search( topQuery, maxResults ).ScoreDocs;
Whoops.