Search code examples
searchlucenesitecoremultilingualcomputed-field

Sitecore 8.2 Lucene search isn't indexing all terms in a computed field


I have a computed field that takes information from an item's sub-items and concatenates it into a new field on the item.

If I step through the debugger, I can see that the computed field is returning the correct information. If I check the indexes that are generated by Sitecore using Luke, I can also see the computed field with the correct values. However, if I perform a search with Luke (or in Sitecore) for a term in the computed field, it does not always return all the documents that contain that term.

I believe this may be related to items with multiple language versions. For example, one of the items has a Dutch version and a Serbian (Latin) version. They both contain the word "vooderlen" in their content. But, when I do a search for that term, only the Serbian document is returned. If I search for "assisteert", both documents are returned. I'm not sure why some terms are being ignored.

Here's the relevant code:

public class ChildContent : AbstractComputedIndexField
{
    public override object ComputeFieldValue(IIndexable indexable)
    {
        Assert.ArgumentNotNull(indexable, "indexable");
        Item item = indexable as SitecoreIndexableItem;

        if (item == null)
        {
            return null;
        }

        // Only compute child content for Detail Layout templates
        string detailLayoutTemplateId = Settings.GetSetting("DetailLayoutTemplateId");

        if (item.TemplateID.ToString() != detailLayoutTemplateId)
        {
            return null;
        }

        // Get Content Detail item
        string contentDetailId = item["Content Detail"];
        var valueString = new StringBuilder();
        Item contentDetailItem;
        string introContent, mainContent;

        if (string.IsNullOrEmpty(contentDetailId))
        {
            return null;
        }

        contentDetailItem = item.Database.GetItem(ID.Parse(contentDetailId), item.Language);

        if (contentDetailItem == null)
        {
            return null;
        }

        // Concatenate intro and main content
        introContent = contentDetailItem["Intro Content"];
        mainContent = contentDetailItem["Main Content"];

        if (!string.IsNullOrWhiteSpace(introContent))
        {
            valueString.Append(Regex.Replace(introContent, "<.*?>", " ") + " ");
        }

        if (!string.IsNullOrWhiteSpace(mainContent))
        {
            valueString.Append(Regex.Replace(mainContent, "<.*?>", " ") + " ");
        }

        return valueString.ToString();
    }
}

I'm also using the following search configuration.

<fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
    <fieldNames hint="raw:AddFieldByFieldName">
        <field fieldName="_uniqueid"
            storageType="YES"
            indexType="TOKENIZED"
            vectorType="NO"
            boost="1f"
            type="System.String"
            settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
            <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
        </field>
        <field fieldName="childcontent"
            storageType="YES"
            indexType="TOKENIZED"
            vectorType="NO"
            boost="1f"
            type="System.String"
            settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
        </field>
    </fieldNames>
</fieldMap>

<fields hint="raw:AddComputedIndexField">
    <field fieldName="childcontent">My.Namespace.ComputedFields.ChildContent, MyWebApp</field>
</fields>

Solution

  • I noticed that if I used the DutchAnalyzer in Luke, I was able to get back more results for the Dutch language. Unfortunately, Sitecore does not offer more than a few analyzers, and none are language specific unless you create a custom analyzer.

    Fortunately, this put me on the right track, and I looked into different ways to get back results for the specific language I needed. By passing in a CultureExecutionContext into the GetQueryable method, I was able to get the results I expected.

    var queryable = context.GetQueryable<SearchResultItem>(
        new CultureExecutionContext(
            CultureInfo.GetCultureInfo(Sitecore.Context.Language.ToString())
        )
    );