Search code examples
elasticsearchelasticsearch-queryelasticsearch-analyzers

Maching two words as a single word


Consider that I have a document which has a field with the following content: 5W30 QUARTZ INEO MC 3 5L

A user wants to be able to search for MC3 (no space) and get the document; however, search for MC 3 (with spaces) should also work. Moreover, there can be documents that have the content without spaces and that should be found when querying with a space.

I tried indexing without spaces (e.g. 5W30QUARTZINEOMC35L), but that does not really work as using a wildcard search I would match too much, e.g. MC35 would also match, and I only want to match two exact words concatenated together (as well as exact single word).

So far I'm thinking of additionally indexing all combinations of two words, e.g. 5W30QUARTZ, QUARTZINEO, INEOMC, MC3, 35L. However, does Elasticsearch have a native solution for this?


Solution

  • I'm pretty sure what you want can be done with the shingle token filter. Depending on your mapping, I would imagine you'd need to add a filter looking something like this to your content field to get your tokens indexed in pairs:

    "filter_shingle":{
       "type":"shingle",
       "max_shingle_size":2,
       "min_shingle_size":2,
       "output_unigrams":"true"
    }
    

    Note that this is also already the default configuration, I just added it for clarity.