Search code examples
lucenesitecoresitecore7

sitecore search synonyms file location


I've changed my DefaultIndexConfiguration config file to search based on synonyms (http://firebreaksice.com/sitecore-synonym-search-with-lucene/) and it works fine. However this is based in a xml file in the filesystem

<param hint="engine" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.XmlSynonymEngine, Sitecore.ContentSearch.LuceneProvider">
   <param hint="xmlSynonymFilePath">C:\inetpub\wwwroot\website\Data\synonyms.xml</param>
</param>

What I'd like to do is to have this data manageable in the CMS. Does anyone know how can I set this xmlSynonymFilePath parameter to achieve what I want? Or am I missing something?


Solution

  • The simplest solution would be to create an item in Sitecore (e.g. /sitecore/system/synonyms) using the template with only one multi-line field called Synonyms and keep xml in this field instead of reading it from file.

    Then create your custom implementation of ISynonymEngine like that (this is just simplest example - it's NOT production ready code):

    public class CustomSynonymEngine : Sitecore.ContentSearch.LuceneProvider.Analyzers.ISynonymEngine
    {
        private readonly List<ReadOnlyCollection<string>> _synonymGroups = new List<ReadOnlyCollection<string>>();
    
        public CustomSynonymEngine()
        {
            Database database = Sitecore.Context.ContentDatabase ?? Sitecore.Context.Database ?? Database.GetDatabase("web");
            Item item = database.GetItem("/sitecore/system/synonyms"); // or whatever is the path
            XmlDocument xmlDocument = new XmlDocument();
            xmlDocument.LoadXml(item["synonyms"]);
            XmlNodeList xmlNodeList = xmlDocument.SelectNodes("/synonyms/group");
    
            if (xmlNodeList == null)
                throw new InvalidOperationException("There are no synonym groups in the file.");
    
            foreach (IEnumerable source in xmlNodeList)
                _synonymGroups.Add(
                    new ReadOnlyCollection<string>(
                        source.Cast<XmlNode>().Select(synNode => synNode.InnerText.Trim().ToLower()).ToList()));
        }
    
        public IEnumerable<string> GetSynonyms(string word)
        {
            Assert.ArgumentNotNull(word, "word");
            foreach (ReadOnlyCollection<string> readOnlyCollection in _synonymGroups)
            {
                if (readOnlyCollection.Contains(word))
                    return readOnlyCollection;
            }
            return null;
        }
    }
    

    And register your engine in Sitecore configuration instead of default engine:

    <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.PerExecutionContextAnalyzer, Sitecore.ContentSearch.LuceneProvider">
      <param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
        <param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer, Sitecore.ContentSearch.LuceneProvider">
          <param hint="engine" type="My.Assembly.Namespace.CustomSynonymEngine, My.Assembly">
          </param>
        </param>
      </param>
    </analyzer>
    

    This is NOT production ready code - it only reads the list of synonyms once when the CustomSynonymsEngine class is instantiated (I don't know if Sitecore keeps the instance or creates new instance multiple times).

    You should extend this code to cache the synonyms and clear the cache every time a synonyms list is changed.

    Also you should think about having a nice synonyms structure in the Sitecore tree instead of having a one item and xml blob which will be really hard to maintain.