Search code examples
elasticsearchhighlighting

Double wildcard in query causes weird highlighting for plain/fast vectors elasticsearch highlighters


I'm working on elasticsearch 1.5.2

After indexing following mapping:

PUT http://localhost:9200/index/_mapping/sometype
{
    "properties" : {
        "sometext" : {
            "type" : "string",
            "term_vector" : "with_positions_offsets"
        }
    }
}

and data:

POST http://localhost:9200/index/sometype
{
    "sometext" : "A supervisor is responsible for the productivity and actions of a small group of employees. The supervisor has several manager-like roles, responsibilities, and powers. Two of the key differences between a supervisor and a manager are (1) the supervisor does not typically have hire and fire authority, and (2) the supervisor does not have budget authority."
}

user is trying to find all documents, but instead one wildcard he typed double:

POST http://localhost:9200/index/sometype/_search
{
    "query" : {
        "query_string" : {
            "query" : "**",
            "fields" : ["sometext"]
        }
    },
    "highlight" : {
        "pre_tags" : ["<em>"],
        "post_tags" : ["</em>"],
        "order" : "score",
        "require_field_match" : true,
        "fields" : {
            sometext : {
                "fragment_size" : 150,
                "number_of_fragments" : 1
            }
        }
    }
}

and got following highlight:

"highlight" : {
    "sometext" : ["responsibilities, <em>and</em> <em>powers</em>. <em>Two</em> <em>of</em> <em>the</em> <em>key</em> <em>differences</em> <em>between</em> <em>a</em> <em>supervisor</em> <em>and</em> <em>a</em> <em>manager</em> <em>are</em> (<em>1</em>) <em>the</em> <em>supervisor</em> <em>does</em> <em>not</em> <em>typically</em> <em>have</em> <em>hire</em> <em>and</em> <em>fire</em> <em>authority</em>, and"]
}

The same highlighting results are produced by query *? But when query consist of just single asterisk - nothing returned by highlighter.

On plain highlighter (I just added "type" : "plain"to highlight) result looks a bit different (but still weird):

"highlight" : {
    "sometext" : [", <em>responsibilities</em>, <em>and</em> <em>powers</em>. <em>Two</em> <em>of</em> <em>the</em> <em>key</em> <em>differences</em> <em>between</em> <em>a</em> <em>supervisor</em> <em>and</em> <em>a</em> <em>manager</em> <em>are</em> (<em>1</em>) <em>the</em> <em>supervisor</em> <em>does</em> <em>not</em> <em>typically</em> <em>have</em> <em>hire</em> <em>and</em> <em>fire</em> <em>authority</em>, <em>and</em> (<em>2</em>) <em>the</em> <em>supervisor</em> <em>does</em> <em>not</em> <em>have</em> <em>budget</em> <em>authority</em>."]
}

Does anybody know what is the reason of such behavior? Maybe queries like ** and *? have some special meaning? Thanks a lot.


Solution

  • Answered on elasticsearch forum https://discuss.elastic.co/t/double-wildcard-in-string-query-causes-incorrect-highlighting-for-plain-and-fast-vectors-highlighters/45939