Search code examples
searchelasticsearchpartial

Elasticsearch partial producttype names


On laptops there is often a "type name" eg. Lenovo T430 Lenovo T430P Lenovo T430S

Aso

The user expects to find all variants of T430 when searching for T430

But the standard analyzers in elasticsearch tokenizes on space/non-alfanumeric aso.

So a search for T430 will only return the T430 variant and not the other variants.

What is the best way to solve this? I have thought about these solutions

  • Detect that the user searches for a producttype and convert the search to a wildcard search eg. T430* - this is difficult to scale

  • Make an analyzer that understands the different types of producttypes and can construct a T430 token from T430S


Solution

  • You can use prefix query which performs better than wildcard queries. For this you need to make your field not analyzed as below

    "type_name": {"type": "string", "index": "not_analyzed"}
    

    Another way could be to use the edge ngram tokenizer which may increase your index size but will give a better performance.

    You can define a custom analyzer as below

    {
        "settings" : {
            "analysis" : {
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "customedgeNgram"
                    }
                },
                "tokenizer" : {
                    "customedgeNgram" : {
                        "type" : "edgeNGram",
                        "min_gram" : "3",
                        "max_gram" : "10"
                    }
                }
            }
        }
    }
    

    you need to change the min_gram and max_gram value as per your needs. and use in your field as

    "type_name": {"type": "string", "analyzer": "my_analyzer"}
    

    Now you can use a simple term query on the field type_name