I recognise this is a moot point on the web database, so this question applies to the master db...
I have a custom index set up in Sitecore 6.4.1 as follows:
<index id="search_content_US" type="Sitecore.Search.Index, Sitecore.Kernel">
<param desc="name">$(id)</param>
<param desc="folder">_search_content_US</param>
<Analyzer ref="search/analyzer" />
<locations hint="list:AddCrawler">
<search_content_home type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
<Database>master</Database>
<Root>/sitecore/content/usa home</Root>
<Tags>home content</Tags>
</search_content_home>
</locations>
</index>
I query the index like this (I am using techphoria414's SortableIndexSearchContext
from this answer: How to sort/filter using the new Sitecore.Search API):
private SearchHits GetSearchResults(SortableIndexSearchContext searchContext, string searchTerm)
{
CombinedQuery query = new CombinedQuery();
query.Add(new FullTextQuery(searchTerm), QueryOccurance.Must);
return searchContext.Search(query, Sort.RELEVANCE);
}
...
SearchHits hits = GetSearchResults(searchContext, searchTerm);
hits
is a collection of search hits from my index. When I iterate through hits
I can see that there are many duplicates of the same items in Sitecore, 1 per version of the item.
I then do the following to get a SearchResultCollection
:
SearchResultCollection results = hits.FetchResults(0, hits.Length);
This combines all of the duplicates into a single SearchResult
object. This object represents 1 version of a particular item, and has a property called SubResults
which is a collection of SearchResult
s that represent all of the other item versions.
Here's my problem:
The version of the item represented by the SearchResult
is NOT the current published version of the item! It appears to be a randomly selected version (whichever the search method hit first in the index). The latest version is included in the SubResults
collection, however.
E.g.:
SearchResult
|
|- Version 8 // main result
...
|- SubResults
|
|- Version 9 // latest version
|- Version 3
|- Version 5
... // all versions in random order
How do I prevent this from happening on the master db? Either by preventing Lucene from indexing old versions of items, or by doing some manipulation of the result set to get the latest version from the SubResults
?
As an aside, why does Lucene bother to index old versions of items anyway? Surely this is pointless for searching content on your website as the old versions are not visible?
You can implement a custom crawler that overrides the following:
public class IndexCrawler : DatabaseCrawler
{
protected override void IndexVersion(Item item, Item latestVersion, Sitecore.Search.IndexUpdateContext context)
{
if (item.Versions.Count > 0 && item.Version.Number != latestVersion.Version.Number)
return;
base.IndexVersion(item, latestVersion, context);
}
}
This ensures that only the latest version of an item gets into your Index, and therefore will be the only item pull out of said index
You would need to update your configuration file to set the correct type for the index of course