We are building a searchmachine with elasticsearch to use intern in our company. We are using one inputfield where users can give in their searchwords (Google like). So it should be possible to search one different kind of words separate by spaces.
Everything is working great, but we have some problems with names… If we search on “Tim Van De Velde”, there are no results for “Tim vandevelde” and this should be possible. Keep in mind that we want to keep our spaces between the words to use our and-operator and that we do not want to many results that are not correct.
Any thoughts or idea’s how we could make this possible?
Take a look at our query:
"filtered": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"type": "most_fields",
"query": "Tim Van De Velde",
"operator": "and",
"boost": 1,
"fields": [
"fullname",
"alias",
"name"
]
}
} ,
{
"multi_match": {
"type": "most_fields",
"query": "Tim Van De Velde",
"operator": "and",
"fields": [
"fullname",
"alias",
"name"
],
"boost": 0.8,
"fuzziness": 1
}
}
Probably what you're looking for is a decomposition analyzer for compound names like the one you mention. Another approach is to use an ngrams
analyzer which will take a sliding window of n-chars over your name. This approach gives you a good recall but somehow a worse precision. So I'd try with a decompound analyzer first and then ngrams
.
The following plugin can handle compound words: Analysis Decompound. It works without a dictionary. For a dictionary based approach use the Compound Word Token Filter
The name you mention will be split into following tokens when using the plugin 1:
{
"tokens": [
{
"token": "tim",
},
{
"token": "vandeveld",
},
{
"token": "vand",
},
{
"token": "veld",
}
]
}