Elasticsearch noob here trying to understand something
I have this query
{
"size": 10,
"_source": "pokemon.name",
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"multi_match": {
"_name": "name-match",
"type": "phrase",
"fields": ["pokemon.name"],
"operator": "or",
"query": "pika"
}
},
{
"multi_match": {
"_name": "weight-match",
"type": "most_fields",
// I use multi_match because I'm not sure how can I change it to match
"fields": ["pokemon.weight"],
"query": "10kg"
}
}
]
}
}
}
The issue is pokemon.weight
has a space between the value and the unit 10 Kg
. So I need to ignore the whitespace in order to match with 10kg
I've tried to change the tokenizer, sadly it can decide where to split but not to remove a character. Anyway I don't know how to use it and the documentation isn't very helpful, explains the theory but not how to use it.
Thanks! Any learning resource will be much appreciated.
You need to define a custom analyzer with a char filter
. where you will replace a space
char with an empty
char, so that tokens generated in your case 10
and g
, becomes 10g
. I tried it locally and working fine for me.
Bonus links for understanding how analysis works in ES and example of the custom analyzer with char filters.
Below is my custom analyzer to achieve required tokens:-
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"\\u0020=>"
]
}
}
}
}
}
Now using the same analyzer, it generated below token, which I confirmed using analyze api.
Endpoint :- http://{{your_hostname}}:9500/{{your_index_name}}/_analyzer
body :-
{
"analyzer" : "my_analyzer",
"text" : "10 g"
}
Result :-
{
"tokens": [
{
"token": "10g",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
}
]
}