Search code examples
indexingsolrsitecoresitecore8commerce

Sitecore 8 XP ContentSearch: Exclude path from Indexing


I'm having trouble with Sitecore Indexing of the general indexes "sitecore_master_index", "sitecore_web_index", which take forever because the crawler/indexer checks all items in the database.

I imported thousands of products with a whole lot of specifications and literally have hundreds of thousands of items in the product repository.

If I could exclude the path from indexing it wouldn't have to check a million items for template exclusion.

FOLLOWUP

I implemented a custom-crawler that excludes a list of paths from being indexed:

<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
  <param desc="name">$(id)</param>
  <param desc="core">sitecore_web_index</param>
  <param desc="rebuildcore">sitecore_web_index_sec</param>
  <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
  <configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration" />
  <strategies hint="list:AddStrategy">
    <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
  </strategies>
  <locations hint="list:AddCrawler">
    <crawler type="Sitecore.ContentSearch.Utilities.Crawler.ExcludePathsItemCrawler, Sitecore.ContentSearch.Utilities">
      <Database>web</Database>
      <Root>/sitecore</Root>
      <ExcludeItemsList hint="list">
        <ProductRepository>/sitecore/content/Product Repository</ProductRepository>
      </ExcludeItemsList>
    </crawler>
  </locations>
</index>

In addition I activated SwitchOnSolrRebuildIndex as it's awesome ootb functionality, cheers SC.

using System.Collections.Generic;
using System.Linq;
using Sitecore.ContentSearch;
using Sitecore.Diagnostics;

namespace Sitecore.ContentSearch.Utilities.Crawler
{
  public class ExcludePathsItemCrawler : SitecoreItemCrawler
  {
    private readonly List<string> excludeItemsList = new List<string>();
    public List<string> ExcludeItemsList
    {
      get
      {
        return excludeItemsList;
      }
    }

    protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation = false)
    {
      Assert.ArgumentNotNull(indexable, "item");
      if (ExcludeItemsList.Any(path => indexable.AbsolutePath.StartsWith(path)))
      {
        return true;
      }
      return base.IsExcludedFromIndex(indexable, checkLocation);
    }
  }
}

Solution

  • You can override SitecoreItemCrawler class which is used by the index you want to change:

    <locations hint="list:AddCrawler">
      <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
        <Database>master</Database>
        <Root>/sitecore</Root>
      </crawler>
    </locations>
    

    You can then add your own parameters, e.g. ExcludeTree or even a list of ExcludedBranches.

    And in the implementation of the class just override method

    public override bool IsExcludedFromIndex(IIndexable indexable)
    

    and check whether it is under excluded node.