Main question
The user is looking for a name and enters the part of the it, let's say au
, and the document with the text paul
is found.
I would like to have the doc highlighted like p<em>au</em>l
.
How can I achieve it if I have a complex search query (combination of match, prefix, wildcard to rule relevance)?
Sub question
When do highlight settings from documentation for type
, boundary_scanner
and boundary_chars
come into play? As per my tests described below, these settings don't change highlighted part.
Try 1: Wildcard query with default analyzer
PUT myindex
{
"mappings": {
"properties": {
"name": {
"type": "text",
"term_vector": "with_positions_offsets"
}
}
}
}
POST myindex/_doc/1
{
"name": "paul"
}
GET myindex/_search
{
"query": {
"wildcard": {"name": "*au*"}
},
"highlight": {
"fields": {
"name": {}
},
"type": "fvh",
"boundary_scanner": "chars",
"boundary_chars": "abcdefghijklmnopqrstuvwxyz.,!? \t\n"
}
}
This kind of search returns highlight <em>paul</em>
but I need to get p<em>au</em>l
.
Try 2: Match query with NGRAM analyzer
This one works as described in SO question: Highlighting part of word in elasticsearch
PUT myindexngram
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "index_ngram_analyzer",
"term_vector": "with_positions_offsets"
}
}
}
}
POST myindexngram/_doc/1
{
"name": "paul"
}
GET myindexngram/_search
{
"query": {
"match": {"name": "au"}
},
"highlight": {
"fields": {
"name": {}
}
}
}
This highlights p<em>au</em>l
as desired but:
match
and wildcard
will again result in <em>paul</em>
.type
, boundary_scanner
and boundary_chars
settings.Elastic version 7.13.4
Response from Elasticsearch team:
A highlighter works on terms, so only full terms can be highlighted - whatever are the terms in your index. In your second example,
au
could be highlighted, because it it a term in the index, which is not the case for your first example. There is also an option to define your ownhighlight_query
that could be different from the mainquery
, but this could lead to unpredictable highlights.
https://discuss.elastic.co/t/configure-highlighted-part/295164