Search code examples
lucenesitecorelucene.netconfigsitecore6

Unable to restrict custom Sitecore Lucene index to /sitecore/content/Home


I am trying to create a new Lucene index on a site running Sitecore 6.3.1. I used the existing "system" index as a guide, and I was successfully able to create a new index on web and master to index all items in the Sitecore content tree.

Where I am running into difficulty, however, is limiting which part of the content tree the database crawler indexes. Currently, the search index contains items from everywhere in the content tree (content items, media library items, layouts, templates, etc.). I would like to limit the index to only items in /sitecore/content/Home.

I have created a file at ~/App_Config/Include/Search Indexes/website.config, and I have pasted relevant sections below:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <sitecore>

    <!-- This works as expected.... -->
    <databases>
      <database id="web">
        <indexes hint="list:AddIndex">
          <index path="indexes/index[@id='website']" />
        </indexes>
      </database>

      <!-- ... similar entry for master database goes here ... -->
    </databases>

    <!-- So does this.... -->
    <indexes>
      <index id="website" singleInstance="true" type="Sitecore.Data.Indexing.Index, Sitecore.Kernel">
        <param desc="name">$(id)</param>
        <fields hint="raw:AddField">
          <!-- ... field descriptions go here ... -->
        </fields>
      </index>
    </indexes>

    <!-- This works... mostly.  The "__website" directory does get created,
          but the Root directive is getting ignored.
    -->
    <search>
      <configuration type="Sitecore.Search.SearchConfiguration, Sitecore.Kernel" singleInstance="true">
        <indexes hint="list:AddIndex">
          <index id="website" singleInstance="true" type="Sitecore.Search.Index, Sitecore.Kernel">
            <param desc="name">$(id)</param>
            <param desc="folder">__$(id)</param>

            <Analyzer ref="search/analyzer" />

            <locations hint="list:AddCrawler">
              <web type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
                <Database>web</Database>
                <Root>/sitecore/content/home</Root>
                <Tags>content</Tags>
              </web>

              <!-- ... similar entry for master database goes here ... -->
            </locations>
          </index>
        </indexes>
      </configuration>
    </search>
  </sitecore>
</configuration>

A couple of notes:

  • This is not from my web.config file; I created a separate file so that I could distribute config changes via Sitecore packages.

  • The index was added to both master and web; I omitted the references to master for brevity.

  • Sitecore is definitely processing the entries for configuration/sitecore/search/configuration. I can see them when I go to http://localhost/sitecore/admin/showconfig.aspx, and if I change one of the tag values to something invalid (e.g., <Root>/nothere</Root>), Sitecore throws an Exception on the next page load.

  • I have reviewed the index contents in IndexViewer, and the wrong items are definitely getting indexed (for example, document #0 in the index is the /sitecore node).

Where am I going wrong? What changes do I need to make to my configuration file to get the search indexer to ignore items outside /sitecore/content/Home?


Solution

  • I was able to solve the problem using the Advanced Database Crawler. Switching out the configuration/search/configuration block with the code provided in Alex's presentation (see above link) made everything start to work, more or less automagically.