I am using Lucene.net and am trying to implement a SynonymFilter to provide expanded terms when items within my database of products can be named differently, or spelled differently - e.g. "spanner" > "wrench", or "lawnmower" > "lawn mower".
As a test I setup a SynonymMap as follows :
String base1 = "lawnmower";
String syn1 = "lawn mower";
String base2 = "spanner";
String syn2 = "wrench";
SynonymMap.Builder sb = new SynonymMap.Builder(true);
sb.Add(new CharsRef(base1), new CharsRef(syn1), true);
sb.Add(new CharsRef(base2), new CharsRef(syn2), true);
SynonymMap smap = sb.Build();
Searching for "spanner" or "wrench" brings back all terms with either word in. Searching for "lawn mower" or "lawnmower" only brings back terms that match exactly the input search criteria.
Is there something else that needs to done for multiple word phrases within the Synonyms?
Also how do I expand to say 3 or more terms for for example "lawnmower", "lawn mower", "mower", "grass cutter"?
Thanks
There is an example of multi-word synonyms in the unit tests. You have to split the words yourself and insert a SynonymMap.WORD_SEPARATOR
(null character) between them. To make this easier, there is a Join
method on SynonymMap.Builder
.
String base1 = "lawnmower";
String syn1 = "lawn mower";
SynonymMap.Builder sb = new SynonymMap.Builder(true);
CharsRef syn1Chars = sb.Join(Regex.Split(syn1, " +"), new CharsRef());
sb.Add(new CharsRef(base1), syn1Chars, true);
SynonymMap smap = sb.Build();
Here is an extension method to make quick work of this.
public static class SynonymMapBuilderExtensions
{
private static Regex Space = new Regex(" +", RegexOptions.Compiled);
public static void AddPhrase(this SynonymMap.Builder builder, string input,
string output, bool keepOrig)
{
CharsRef outputRef = builder.Join(Space.Split(output), new CharsRef());
builder.Add(new CharsRef(input), outputRef, keepOrig);
}
}
You can then use this extension method whether the synonym has spaces or not, and you don't have to bother with creating the CharsRef
objects if you don't need them anywhere else in your code.
String base1 = "lawnmower";
String syn1 = "lawn mower";
String base2 = "spanner";
String syn2 = "wrench";
SynonymMap.Builder sb = new SynonymMap.Builder(true);
sb.AddPhrase(base1, syn1, true);
sb.AddPhrase(base2, syn2, true);
SynonymMap smap = sb.Build();