Search code examples
elasticsearchfull-text-searchprecisionbooleanquery

Elasticsearch - How to guess important words in queries?


Suppose we are executing two queries like below on our Available Job positions Index:

  • PHP Developer
  • Ruby Developer

When performing a simple boolean AND query positions like PHP Programmer will be excluded due the lack of presence for developer. when performing an OR boolean query for PHP Developer documents containing Ruby developer would be also included in results.

What is the best way to detect that in the phrase PHP Developer, PHP is more important than Developer?

So when performing search against PHP Developer the PHP term MUST appear in the result but the Developer section would only increase the score.


Solution

  • You can use the regular "match" query and add a "cutoff_frequency" parameter. like:

    {
         "query": {
               "match": {
                    "<field_name>": {
                          "query": "PHP Developer",
                          "operator": "AND",
                          "cutoff_frequency": 0.001
                    }                
               }
         }
    }
    

    That way, each term that appers in less then 0.1% of the documents - will be considered "important" and will be a "must" while the other terms will not be a "must" but only increase the score. "Developer" will be more common than "PHP" so that "PHP" will be a must but "Developer" will be optional but rated higher. Note that "PHP" might still be pretty common so you do need to fine-tune the right frequency!