I can find the target result when I have a space between words
City Lab
in query:
curl -XGET http://localhost:9200/companies_company_data3/_search -H 'Content-Type: application/json' -d '{
"query": {
"bool": {
"must": {
"match": {
"name": {
"fuzziness": "AUTO",
"query": "City Lab"
}
}
}
}
},
"size": 5
}'
it gives expected result:
{ [35/1869]
"took": 189,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7463,
"max_score": 16.600964,
"hits": [
{
"_index": "companies_company_data3",
"_type": "_doc",
"_id": "3232333",
"_score": 16.600964,
"_source": {
"sourceId": "22",
"regionName": "US",
"name": "City Lab",
"id": "3232333"
}
},
but when I remove the space between these two words: CityLab
it can't find it. Full query:
curl -XGET http://localhost:9200/companies_company_data3/_search -H 'Content-Type: application/json' -d '{
"query": {
"bool": {
"must": {
"match": {
"name": {
"fuzziness": "AUTO",
"query": "CityLab"
}
}
}
}
},
"size": 5
}'
How can I modify the fuzzy query to allow find company name "City Lab" by user's "CityLab" input ?
My index mapping:
curl -XGET http://localhost:9200/companies_company_data3/_mapping -H 'Content-Type: application/json'
returns
{
"companies_company_data3": {
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"regionName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"sourceId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
The short answer is: search for the keyword not the text
"match": {
"name.keyword": {
"fuzziness": "AUTO",
"query": "CityLab"
}
}
More details:
1) Saving the document
first, you should know how the query works, when you insert "City Lab" as a keyword, so Elasticsearch will save it as it, as one term, in other words when you search for the term "City Lab" you will get it.
and when you save it as a text, Elasticsearch will save it like this "city" and "lab".
2) Searching for the document
when you use a match, what happens is that the standard text analyzer splits the "City Lab" to "city" and "lab", then searches for the two new terms, and when you apply fuzziness it will be applied for each term separately.
and when you search for "CityLab", the text analyzer changes it to "citylab" and then it searches for it as one term.
3) How the query works
so when you write:
"match": {
"name": {
"fuzziness": "AUTO",
"query": "CityLab"
}
}
knowing that the mapping is:
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
so what we have in the mapping is a name with type text, and a subfield called keyword with type keyword.
First search case: in your query when you search for "City Lab" you will use the text analyzer to search for "city" and "lab" in the filed name, which is text type. and you will get results of course because the document has "city" and "lab" inside the name field.
Second search case: in your query when you search for "CityLab" you will use the text analyzer to search for "citylab" in the filed name, which is text type. and you will not get results of course because the document has "city" and "lab" inside the name field.
"citylab" -> "city" -> 3 changes "citylab" -> "lab" -> 4 changes
4) Solution
search for the keyword not for the text. as the keyword field contains "City Lab" as one term.
"match": {
"name.keyword": {
"fuzziness": "AUTO",
"query": "CityLab"
}
}
when you search here for "CityLab" it will be like this:
"CityLab" -> "City Lab" -> 1 change
another solution is to change the text analyzer but I guess this is not what you are looking for, but in general, changing the analyzer to a custom one instead of the standards so that you can save the text including the spaces.
another solution is using wildcard where you can search for "City*Lab" but I also don't think that you are looking for this one
Note that
Fuzziness calculates changes as an edit distance which is the number of one-character changes needed to turn one term into another. These changes can include: