I am trying to implement a fuzzy/autocomplete search in Elasticsearch through NodeJS. I have indexed data by index "artist"
. Here is an example of stored data in ES.
{
"hits": [{
"_index": "artist",
"_type": "_doc",
"_id": "EyejqnAB2pHGVJHwV53Q",
"_score": 1,
"_source": {
"kind": "song",
"artistId": 111051,
"artistName": "Eminem",
"trackName": "Crack a Bottle (feat. Dr. Dre & 50 Cent)",
"collectionName": "Relapse (Deluxe Version)",
"collectionCensoredName": "Relapse (Deluxe Version)",
"artistViewUrl": "https://music.apple.com/us/artist/eminem/111051?uo=4",
"collectionViewUrl": "https://music.apple.com/us/album/crack-a-bottle-feat-dr-dre-50-cent-feat-dr-dre-50-cent/1440558626?i=1440558826&uo=4",
"trackViewUrl": "https://music.apple.com/us/album/crack-a-bottle-feat-dr-dre-50-cent-feat-dr-dre-50-cent/1440558626?i=1440558826&uo=4",
"previewUrl": "https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview128/v4/da/a5/c1/daa5c140-2c3d-1f74-40c3-b6e596e52b82/mzaf_7480202713407880256.plus.aac.p.m4a",
"artworkUrl100": "https://is1-ssl.mzstatic.com/image/thumb/Music128/v4/c5/f8/fd/c5f8fdf6-d4c9-85c9-d169-c5d349a44f1c/source/100x100bb.jpg",
"collectionPrice": 12.99,
"releaseDate": "2009-02-02T12:00:00Z",
"collectionExplicitness": "explicit",
"trackExplicitness": "explicit",
"discCount": 1,
"discNumber": 1,
"trackCount": 24,
"trackNumber": 18,
"country": "USA",
"currency": "USD"
}
}]
}
Above artistName
has value as Eminem
and problem is when I type 'e'
, it doesn't shows anything, same on 'em'
, emi
, emin
. When i type emine
then it starts giving out results. Where am i going wrong?
There are multiple ways to implement the autocomplete functionality, and a fuzzy search is not the correct one(It's mainly used to search the related documents w.r.t to tokens(de-dupe) and in spell-checker refer this for applications of fuzzy search).
In your case, I would suggest using the prefix query if your index size isn't huge and restrict the minimum character length to two, i.e., don't search for e
and shows search results only when the user typed two or more characters ie em
or emi
,emin
etc.
Index mapping
{
"mappings": {
"properties": {
"artistName": {
"type": "text"
}
}
}
}
{
"artistName" : "Eminem"
}
{
"artistName" : "Emiten"
}
{
"query": {
"prefix": {
"artistName": {
"value": "em"
}
}
}
}
{
"_index": "so-60558525-auto",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"artistName": "Eminem"
}
},
{
"_index": "so-60558525-auto",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"artistName": "Emiten"
}
}
There are broadly four approaches that you can choose to implement the autocomplete, and each of them has some trade-off, which you should be aware of justifying your functional requirement as well as non-functional(performance, maintenance, implementation difficulties).