Search code examples
elasticsearchhighlight

Elasticsearch Dropping Letters in Highlight


I have an Elasticsearch index and when applying highlighting to a search it is dropping characters from the field.

Example:

GET /myindex/_search
{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "greene and associates",
          "fuzziness" : "AUTO",
          "fields": [
            "name"
          ]
        }
      }
    }
  },
  "highlight" : {
        "fields" : {
            "name" : {}
        }
    }
}

Returns the following:

{
     "_index" : "myindex",
         ...       
      "name" : "A. A. Greene & Associates",
      ... 
     },
     "highlight" : {
       "name" : [
         "<em>Greene</em> & <em>Associates</em>"
       ]
     }
}

I would expect the results to be

{
      "_index" : "myindex",
          ...       
       "name" : "A. A. Greene & Associates",
        ... 
      },
      "highlight" : {
        "name" : [
          "A. A. <em>Greene</em> & <em>Associates</em>"
        ]
      }
}

What do I have wrong in this query? No matter what I try I cannot get the "A. A." to come back in the highlight results.

We're running v7.4 and I've searched for others with this issue but haven't found anything yet.

This is the way the field is defined for the index:

"name" : {
          "type" : "text",
          "boost" : 3.0,
          "fields" : {
            "raw" : {
              "type" : "keyword"
            },
            "suggest" : {
              "type" : "completion",
              "analyzer" : "simple",
              "preserve_separators" : true,
              "preserve_position_increments" : true,
              "max_input_length" : 50
            }
          }
        }

Solution

  • I found what I was missing. The default highlighter is the Unified Highlighter and that breaks the text into sentences (the periods in the name qualify). I changed the highlighter type to Plain and it works as expected.

    New Query:

    GET /myindex/_search
    {
      "query": {
        "bool": {
          "must": {
            "multi_match": {
              "query": "greene and associates",
              "fuzziness" : "AUTO",
              "fields": [
                "name"
              ]
            }
          }
        }
      },
      "highlight" : {
    
            "fields" : {
                "name" : {"type" : "plain"}
    
            }
        }
    }