I need to search by an array of values, and each value can be either simple text or text with askterisks(*
).
For example:
["MYULTRATEXT"]
And I have the next index(i have a really big index, so I will simplify it):
................
{
"settings": {
"analysis": {
"char_filter": {
"asterisk_remove": {
"type": "pattern_replace",
"pattern": "(\\d+)*(?=\\d)",
"replacement": "1$"
}
},
"analyzer": {
"custom_search_analyzer": {
"char_filter": [
"asterisk_remove"
],
"type": "custom",
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer":"keyword",
"search_analyzer": "custom_search_analyzer"
},
......................
And all data in the index is stored with asterisks *
e.g.:
curl -X PUT "localhost:9200/locations/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
"name" : "MY*ULTRA*TEXT"
}
I need to return exact the same name
value when I search by this string MYULTRATEXT
curl -XPOST 'localhost:9200/locations/_search?pretty' -d '
{
"query": { terms: { "name": ["MYULTRATEXT"] } }
}'
It Should return MY*ULTRA*TEXT
, but it does not work, so can't find a workaround. Any thoughts?
I tried pattern_replace
but seems like I am doing something wrong or I am missing something here.
So I need to replace all *
to empty `` while searching
There appears to be a problem with the regex you provided and the replacement pattern.
I think what you want is:
"char_filter": {
"asterisk_remove": {
"type": "pattern_replace",
"pattern": "(\\w+)\\*(?=\\w)",
"replacement": "$1"
}
}
Note the following changes:
\d
=> \w
(match word characters instead of only digits)*
since asterisks have a special meaning for regexes1$
=> $1
($<GROUPNUM>
is how you reference captured groups)To see how Elasticsearch will analyze the text against an analyzer, or to check that you defined an analyzer correctly, Elasticsearch has the ANALYZE API endpoint that you can use: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html
If you try this API with your current definition of custom_search_analyzer
, you will find that "MY*ULTRA*TEXT" is analyzed to "MY*ULTRA*TEXT" and not "MYULTRATEXT" as you intend.
I have a personal app that I use to more easily interact with and visualize the results of the ANALYZE API. I tried your example and you can find it here: Elasticsearch Analysis Inspector.