Search code examples
sitecorelucene.netsitecore8

Sitecore & Lucene search auto-complete using NGram


I'm trying to setup autocomplete for content search using Ngram. Here is my lucene index:

<autocompleteSearchConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
      <indexAllFields>false</indexAllFields>
      <initializeOnAdd>true</initializeOnAdd>
      <analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" />
      <fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
        <fieldNames hint="raw:AddFieldByFieldName">
          <field
            fieldName="page_title"
            storageType="YES"
            indexType="TOKENIZED"
            vectorType="NO"
            boost="1.5f"
            nullValue="NULL"
            emptyString="EMPTY"
            type="System.String"
            settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
            <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.NGramAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
          </field>
        </fieldNames>
      </fieldMap>
      <fields hint="raw:AddComputedIndexField">
        <field fieldName="page_title" storageType="yes">Client.Website.Code.Search.AutoCompleteTitle, Client.Website</field>
      </fields>
      <fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/>
      <indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/>
      <indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/>
      <documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType>
    </autocompleteSearchConfiguration>

Notice that I am using the NgramAnalyzer (reference: Sitecore.ContentSearch.LuceneProvider.Analyzers).

When I look at this index in luke, I can see that it manifests the correct data. However, the following iQueryable doesn't retain any result.

var index = ContentSearchManager.GetIndex("INDEX NAME GOES HERE");
using (var context = index.CreateSearchContext())
{
var query = context.GetQueryable<AutocompleteSearchResult>().Where(i => i.PageTitle == term)
var result = query.GetResults();
}

Solution

  • Why not use "StartsWith" instead of ==?

    See this article.

    Sitecore provides an n-gram analyzer for Lucene.net (Sitecore.ContentSearch.LuceneProvider.Analyzers). If you use Solr, you can set this up in the Solr Schema.xml file.

    You use the n-gram analyzer to create autocomplete functionality for search input. The analyzer breaks tokens up into unigrams, bigrams, trigrams, and so on. When a user types a word, the n-gram analyzer looks the word up in different positions, using the tokens that it generated.

    You add support for autocomplete by adding a new field to the index and mapping this field to use the n-gram analyzer instead of the default. When you run the LINQ query to query that field, use the following code:

    using (IProviderSearchContext context = Index.CreateSearchContext())
            {
                result = context.GetQueryable<SearchResultItem>().
                    .Where(i => i.Name.StartsWith(“some”))
                    .Take(20)
                    .ToList();
            }
    

    Sitecore provides an implementation that uses trigrams and a set of English stop words. If you have other requirements, you can build a new analyzer and change these settings.