Search code examples

Highlight words with whitespace in Elasticsearch 7.6

I would like to use Elasticsearch highlight to obtain matched keywords found inside a text. This is my settings/mappings

  "settings": {
    "analysis": {
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "- => _",
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
          "filter": [
  "mappings": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_analyzer"
        "description": {
          "type": "text",
          "analyzer": "my_analyzer",
          "fielddata": True

I am using a char_filter to search and highligth hypenated words. This my document example:

    "_index": "test_tokenizer",
    "_type": "_doc",
    "_id": "DbBIxXEBL7VGAl98vIRl",
    "_score": 1.0,
    "_source": {
        "title": "Best places: New Mexico and Sedro-Woolley",
        "description": "This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"

and this is the query I use

    "query": {
        "query_string" : {
            "query" : "\"New York\" OR \"Rome\" OR \"Milton-Freewater\"",
            "default_field": "description"
    "highlight" : {
        "pre_tags" : ["<key>"],
        "post_tags" : ["</key>"],
        "fields" : {
            "description" : {
                "number_of_fragments" : 0

and this is the output I have

"hits": [
        "_index": "test_tokenizer",
        "_type": "_doc",
        "_id": "GrDNz3EBL7VGAl98EITg",
        "_score": 0.72928625,
        "_source": {
            "title": "Best places: New Mexico and Sedro-Woolley",
            "description": "This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"
        "highlight": {
            "description": [
                "This is an example text containing some cities like <key>New</key> <key>York</key>, Toronto, <key>Rome</key> and many other. So, there are also <key>Milton-Freewater</key> and Las Vegas!"

Rome and Milton-Freewater are highlighted correctly. New York is not

How can I have <key>New York</key> instead of <key>New</key> and <key>York</key>?


  • There is an open PR regarding this but I'd suggest the following interim solution:

    1. Add a term_vector setting
    PUT test_tokenizer
      "settings": {
        "analysis": {
          "char_filter": {
            "my_char_filter": {
              "type": "mapping",
              "mappings": [
                "- => _"
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "standard",
              "char_filter": [
              "filter": [
      "mappings": {
        "properties": {
          "title": {
            "type": "text",
            "analyzer": "my_analyzer"
          "description": {
            "type": "text",
            "analyzer": "my_analyzer",
            "term_vector": "with_positions_offsets",
            "fielddata": true
    1. Sync a doc
    POST test_tokenizer/_doc
    {"title":"Best places: New Mexico and Sedro-Woolley","description":"This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"}
    1. Convert your query_string to a bunch of bool-should match_phrases inside the highlight_query and use type: fvh
    GET test_tokenizer/_search
      "query": {
        "query_string": {
          "query": "'New York' OR 'Rome' OR 'Milton-Freewater'",
          "default_field": "description"
      "highlight": {
        "pre_tags": [
        "post_tags": [
        "fields": {
          "description": {
            "highlight_query": {
              "bool": {
                "should": [
                    "match_phrase": {
                      "description": "New York"
                    "match_phrase": {
                      "description": "Rome"
                    "match_phrase": {
                      "description": "Milton-Freewater"
            "type": "fvh",
            "number_of_fragments": 0


          "This is an example text containing some cities like <key>New York</key>, Toronto, <key>Rome</key> and many other. So, there are also <key>Milton-Freewater</key> and Las Vegas!"