Search code examples
elasticsearchlucenenest

Elasticsearch synonym issue


I've had a look on the other questions surrounding this problem but it doesn't seem to help.

I'm having to change an input of "i phone" or "i Phone" to query "iPhone" in Elasticsearch.

As you can see, I have tried most everything I can think of, including simply "phone => iPhone" and leaving the "i" in there to hang around and possibly add it to the stopwords.

I've tried using "simple", "keyword", "standard" and "whitespace" for my custom analyzer.

Can anyone spot where I've gone wrong, this is the last problem before I can finish my project so it'd be appreciated. Thanks

P.S. Bonus points if you include how I can do auto suggest on inputs, thanks

Below is my code

public static CreateIndexDescriptor GetMasterProductDescriptor(string indexName = "shopmaster")
        {
            var indexDescriptor = new CreateIndexDescriptor(indexName)
                .Settings(s => s
                            .Analysis(a => a
                                .TokenFilters(t => t
                                    .Stop("my_stop", st => st
                                        .StopWords("_english_", "new", "cheap")
                                        .RemoveTrailing()
                                    )
                                    .Synonym("my_synonym", st => st
                                        .Synonyms(
                                            "phone => iPhone"
                                        //"i phone => iPhone",
                                        //"i Phone => iPhone"
                                        )
                                    )
                                    .Snowball("my_snowball", st => st
                                        .Language(SnowballLanguage.English)
                                    )
                                )
                                .Analyzers(an => an
                                    .Custom("my_analyzer", ca => ca
                                        .Tokenizer("simple")
                                        .Filters(
                                            "lowercase",
                                            "my_stop",
                                            "my_snowball",
                                            "my_synonym"
                                        )
                                    )
                                )
                            )
                        )
                .Mappings(
                    ms => ms.Map<MasterProduct>(
                        m => m.AutoMap()
                            .Properties(
                                ps => ps
                                    .Nested<MasterProductAttributes>(p => p.Name(n => n.MasterAttributes))
                                    .Nested<MasterProductAttributes>(p => p.Name(n => n.ProductAttributes))
                                    .Nested<MasterProductAttributeType>(p => p.Name(n => n.MasterAttributeTypes))
                                    .Nested<Feature>(p => p.Name(n => n.Features))
                                    .Nested<RelatedProduct>(p => p.Name(n => n.RelatedProducts))
                                    .Nested<MasterProductItem>(
                                        p => p.Name(
                                                n => n.Products
                                            )
                                            .Properties(prop => prop.Boolean(
                                                b => b.Name(n => n.InStock)
                                            ))
                                    )
                                    .Boolean(b => b.Name(n => n.InStock))
                                    .Number(t => t.Name(n => n.UnitsSold).Type(NumberType.Integer))
                                    .Text(
                                        tx => tx.Name(e => e.ManufacturerName)
                                            .Fields(fs => fs.Keyword(ss => ss.Name("manufacturer"))
                                                    .TokenCount(t => t.Name("MasterProductId")
                                                            .Analyzer("my_analyzer")
                                                    )
                                            )
                                            .Fielddata())
                                    //.Completion(cm=>cm.Analyzer("my_analyser")
                                    )
                    )
                );
            return indexDescriptor;
        }

Solution

  • The order of your filters matters!

    You are applying lowercase, then a stemmer (snowball) then synonyms. You synonyms contain capital letters, but by the time they are applied, lowercasing has already occurred. It's a good idea to apply lowercasing first, to make sure case doesn't affect matching of the synonyms, but your replacements, in the that case, shouldn't have caps.

    Stemmers should not be applied before synonyms (unless you know what you are doing, and are comparing post-stemming terms). Snowball, I believe, will transform 'iphone' to 'iphon', so this is another area where you are running into trouble.

    "lowercase",
    "my_synonym",
    "my_stop",
    "my_snowball",
    

    (And don't forget to remove the caps from your synonyms)