Search code examples
pythondictionaryelasticsearchquerying

Elasticsearch: highlight on query terms, not filter terms?


Say I have this:

search_object = {
    'query': { 
        'bool' : { 
            'must' : { 
                'simple_query_string' : { 
                    'query': search_text,
                    'fields': [ 'french_no_accents', 'def_no_accents', ],
                },  
            },
            'filter' : [ 
                { 'term' : { 'def_no_accents' : 'court', }, },
                { 'term' : { 'def_no_accents' : 'bridge', }, },
            ],
              
        },
    },
    'highlight': {
        'encoder': 'html',
        'fields': {
            'french_no_accents': {},
            'def_no_accents': {},
        },
        'number_of_fragments' : 0,
    },        
}

... whatever search string I enter as search_text, its constituent terms, but also "court" and "bridge" are highlighted. I don't want "court" or "bridge" to be highlighted.

I've tried putting the "highlight" key-value in a different spot in the structure... nothing seems to work (i.e. syntax exception thrown).

More generally, is there a formal grammar anywhere specifying what you can and can't do with ES (v7) queries?


Solution

  • You could add a highlight query to limit what should and shouldn't get highlighted:

    {
      "query": {
        "bool": {
          "must": {
            "simple_query_string": {
              "query": "abc",
              "fields": [
                "french_no_accents",
                "def_no_accents"
              ]
            }
          },
          "filter": [
            { "term": { "def_no_accents": "court" } },
            { "term": { "def_no_accents": "bridge" } }
          ]
        }
      },
      "highlight": {
        "encoder": "html",
        "fields": {
          "*_no_accents": {                    <--
            "highlight_query": {
              "simple_query_string": {
                "query": "abc",
                "fields": [ "french_no_accents", "def_no_accents" ]
              }
            }
          }
        },
        "number_of_fragments": 0
      }
    }
    

    I've used a wildcard for the two fields (*_no_accents) -- if that matches unwanted fields too, you'll need to duplicate the highlight query on two separate, non-wilcard highlight fields like you originally had. Though I can't think of a scenario where that'd happen since your multi_match query targets two concrete fields.


    As to:

    More generally, is there a formal grammar anywhere specifying what you can and can't do with ES (v7) queries?

    what exactly are you looking for?