Search code examples
lucenesitecorelucene.netsitecore8predicatebuilder

Sitecore Search Predicate Builder multiple keyword search with boosting not working as desired


I have sitecore pages / lucene documents with the following fields:

  • Title
  • Filename
  • Content
  • File Contents

I'm creating a search for these and have the following requirements:

  • Hits containing the whole phrase in the title field should be returned first.
  • Hits containing the whole phrase in the filename field should be returned second.
  • Hits containing the whole phrase in the content should be returned third
  • Hits containing the whole phrase in the file contents should be returned fourth
  • Hits containing all of the keywords (in any order) in the title field should be returned fifth
  • Hits containing all of the keywords (in any order) in the filename field should be returned sixth
  • Hits containing all of the keywords (in any order) in the content should be returned seventh.
  • Hits containing all of the keywords (in any order) in the file contents should be returned eighth.

Here is what I've got:

    public static Expression<Func<T, bool>> GetSearchTermPredicate<T>(string searchTerm) 
        where T : ISearchableItem
    {
        var actualPhrasePredicate = PredicateBuilder.True<T>()
            .Or(r => r.Title.Contains(searchTerm).Boost(2f))
            .Or(r => r.FileName.Contains(searchTerm).Boost(1.5f))
            .Or(r => r.Content.Contains(searchTerm))
            .Or(r => r.DocumentContents.Contains(searchTerm));

        var individualWordsPredicate = PredicateBuilder.False<T>();

        foreach (var term in searchTerm.Split(' '))
        {
            individualWordsPredicate 
                = individualWordsPredicate.And(r => 
                   r.Title.Contains(term).Boost(2f)
                || r.FileName.Contains(term).Boost(1.5f)
                || r.Content.Contains(term)
                || r.DocumentContents.Contains(term));
        }

        return PredicateBuilder.Or(actualPhrasePredicate.Boost(2f), 
            individualWordsPredicate);
    }

The actual phrase part seems to work well. Hits with the full phrase in the title are returned first. However, if I remove a word from the middle of the phrase, no results are returned.

i.e. I have a page with a title "The England football team are dreadful", but when I search with "The football team are dreadful", it doesn't find the page.

Note: pages can have documents attached to them, so I want to boost the filenames too but not as highly as the page title.


Solution

  • I managed to get this to work with the following:

        public static Expression<Func<T, bool>> GetSearchTermPredicate<T>(string searchTerm) 
            where T : ISearchableItem
        {
            var actualPhraseInTitlePredicate = PredicateBuilder.True<T>()
                .And(r => r.Title.Contains(searchTerm));
    
            var actualPhraseInFileNamePredicate = PredicateBuilder.True<T>()
                .And(r => r.FileName.Contains(searchTerm));
    
            var actualPhraseInContentPredicate = PredicateBuilder.True<T>()
                .And(r => r.Content.Contains(searchTerm));
    
            var actualPhraseInDocumentPredicate = PredicateBuilder.True<T>()
                .And(r => r.DocumentContents.Contains(searchTerm));
    
            var terms = searchTerm.Split(' ');
    
            var titleContainsAllTermsPredicate = PredicateBuilder.True<T>();
    
            foreach (var term in terms)
                titleContainsAllTermsPredicate 
                    = titleContainsAllTermsPredicate.And(r => r.Title.Contains(term).Boost(2f));
    
            var fileNameAllTermsContains = PredicateBuilder.True<T>();
    
            foreach (var term in terms)
                fileNameAllTermsContains 
                    = fileNameAllTermsContains.And(r => r.FileName.Contains(term));
    
            var contentContainsAllTermsPredicate = PredicateBuilder.True<T>();
    
            foreach (var term in terms)
                contentContainsAllTermsPredicate 
                    = contentContainsAllTermsPredicate.And(r => r.Content.Contains(term));
    
            var documentContainsAllTermsPredicate = PredicateBuilder.True<T>();
    
            foreach (var term in terms)
                documentContainsAllTermsPredicate 
                    = documentContainsAllTermsPredicate.And(r => r.DocumentContents.Contains(term));
    
    
            var predicate = actualPhraseInTitlePredicate.Boost(3f)
                .Or(actualPhraseInFileNamePredicate.Boost(2.5f))
                .Or(actualPhraseInContentPredicate.Boost(2f))
                .Or(actualPhraseInDocumentPredicate.Boost(1.5f))
                .Or(titleContainsAllTermsPredicate.Boost(1.2f))
                .Or(fileNameAllTermsContains.Boost(1.2f))
                .Or(contentContainsAllTermsPredicate)
                .Or(documentContainsAllTermsPredicate);
    
            return predicate;
        }
    

    It's obviously quite a bit more code, but I think separating the predicates makes more sense for boosting to work effectively.

    The main issue with the previous code was two fold:

    1. PredicateBuilder.Or(actualPhrasePredicate.Boost(2f), individualWordsPredicate) doesn't seem to include the predicate being Or'd. When doing a .ToString() on the resulting joined predicate, the expression didn't contain anything for the individualWordsPredicate
    2. After fixing that it still didn't work, and this was because I was using PredicateBuilder.False<T>() for the individualWordsPredicate. When looking at the expression it was basically producing (False AND Field.Contains(keyword)) which of course will never evaluate to true. Using .True<T>() fixed this.