Search code examples
c#.netboostlucenescoring

Lucene - Scoring effect on clause count


I had some issues with Lucene as it always had a constant score and it ignored my boost values.

Setting the parser rewriteMethod to SCORING_BOOLEAN_QUERY_REWRITE did the trick but it has a weird side effect on the 'clauseCount' that I don't quite get.

With constant scoring I have no issues with the maxClauseCount which is 1024 on default. With dynamic scoring the clauseCount quickly exceeded 1024 and I really wonder why that is.

Does anyone know the technical details of this?

In another post someone mentioned that queries like 'ca*' is rewritten to 'car OR cars'. But shouldn't that be always the case, no matter if you use constant or dynamic scoring?

Thanks in advance!

edit: So here's my solution. I ran into some problems because the document boosting value I set when the doc was created was always 1.0 when I got the doc later. Maybe a bug, I'm not sure about this. What I know is that when you get a document from the searcher, the document object is newly created and the boost value is never set. Just the fields. Could be related to the C# port. Anyway, I wrote a CustomScoreQuery that uses the original query and multiplies the score with my initial boost value that I've set in the doc field (a nasty workaround, I know)

Enough talk, here's my code. I'm open to improvements. Especially where I could get the original boost value without the need of a searcher or a field.

public class DynamicBoostingQuery : CustomScoreQuery
{
    private Searcher s;

    public DynamicBoostingQuery(Query q, Searcher searcher)
        : base(q)
    {
        this.s = searcher;
    }

    public override float CustomScore(int doc, float subQueryScore, float valSrcScore)
    {
        float val = base.CustomScore(doc, subQueryScore, valSrcScore);

        try
        {
            Document d = s.Doc(doc);

            float priority = float.Parse(d.Get("raw_categoryPriority"));

            return val * priority;
        }
        catch
        {
            return val;
        }        
    }
}

Solution

  • MultiTermQuery's default (in Lucene 3.5 on Java, don't know the exact version this was introduced) is CONSTANT_SCORE_AUTO_REWRITE_DEFAULT, which uses the CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE only up to a defined threshold of clauses and hits and beyond that switches to CONSTANT_SCORE_FILTER_REWRITE, which never raises TooManyClauses. You overrode that and forced Lucene into using a BooleanQuery rewrite. Unfortunately, there is no option to use a Filter-based rewrite if you need the score.

    Maybe you can try using CustomScoreQuery to recover your document boosts.