Search code examples
elasticsearchelasticsearch-6

Getting exact match with Elastic Search 6 and php ClientBuilder


I'm building an elasticsearch powered layered navigation module for an ecomm site. It's all working great, I can fetch the options from my external source and display them. Selecting them works too but I've run into a snag where one of the filter options has these choices;

FINISHES:

Finished (1)

Semi-Finished (16)

Semi Finished (1)

Clearly the 2 variations with and without a hyphen should be tidied up, but ignoring that for a moment, when I apply the following to my collection;

$client = $this->clientBuilder;
$params .... etc
$params['body']['query']['bool']['must'][] = ['match_phrase' => [$split[0] => "$selected"]];
$response = $client->search($params);

Where $split[0] is the elasticsearch field ref for 'FINISHES' and $selected is the chosen value. If you click on any of the options, I am getting all 18 records back. No doubt because they all contain one of the words being searched 'finished'.

How can make this search for the exact term only? I've tried escaping the hyphen with \- which didnt help, I've also tried checking whether the searched term has spaces or hyphens and trying to forcibly add them to 'must_not', but that didn't work either;

if(!$space) {
    $params['body']['query']['bool']['must_not'][] = ['match' => [$split[0] => ' ']];
}
if(!$hyphen) {
    $params['body']['query']['bool']['must_not'][] = ['match' => [$split[0] => '\\-']];
}

Solution

  • By default standard analyzer is applied to all fields. So in your case, Semi-Finished is the keyword and the inverted index will contain two words semi and finished, so every time you look for finished it matches since standard analyzer breaks it on hyphen.

    POST _analyze
    {
      "analyzer": "standard",
      "text": ["Semi-Finished"]
    }
    
    ##Result
    {
      "tokens" : [
        {
          "token" : "semi",
          "start_offset" : 0,
          "end_offset" : 4,
          "type" : "<ALPHANUM>",
          "position" : 0
        },
        {
          "token" : "finished",
          "start_offset" : 5,
          "end_offset" : 13,
          "type" : "<ALPHANUM>",
          "position" : 1
        }
      ]
    }
    

    .keyword searches against original text i.e. non-analyzed. In your case, fieldname.keyword should work.

    POST _analyze
    {
      "analyzer": "keyword",
      "text": ["Semi-Finished"]
    }
    
    ##Result
    {
      "tokens" : [
        {
          "token" : "Semi-Finished",
          "start_offset" : 0,
          "end_offset" : 13,
          "type" : "word",
          "position" : 0
        }
      ]
    }