I have an Elasticsearch index with two fields: “title” and “product_titles”. I want to search for documents that match a query string in both fields, but give more priority to the “title” field. For example, if I search for “A B”, I want the documents that have “A B” in the “title” field to rank higher than the documents that have “A B” in the “product_titles” field. I also want to consider the number of occurrences of the query string in the “product_titles” field as a secondary factor for ranking.
Here is a sample of expected result sorted by score in JSON format:
[
{
"title": "A B ...", // title matches
"product_titles": ["anything", "anything"]
},
{
"title": "anything",
"product_titles": ["....A B....", " ... A B", "A B ..."] // three matches
},
{
"title": "anything",
"product_titles": ["....A B....", " ... A B"] // two matches
},
{
"title": "A", // partial match
"product_titles": ["anything"] // no match
}
]
This is one way of achieving this, but you'll have to re-index your data. Also, test it thoroughly, it works with the data you provided.
Create the mapping for title
field initially, and also provide a similarity function, which defines how score will be computed for title
field. Like this:
PUT /dummy-index
{
"settings": {
"number_of_shards": 1,
"similarity": {
"scripted_tf": {
"type": "scripted",
"script": {
"source": "return Math.pow(doc.freq, query.boost)"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"similarity": "scripted_tf"
},
"product_titles": {
"type": "text"
}
}
}
}
In this query, we are providing a similarity function for title
field. In which we are raising the score
for title
using power
function, rather than multiplying.
Then, after indexing the data, the following query gives desired results.
GET dummy-index/_search
{
"query": {
"multi_match": {
"query": "A B",
"fields": ["title^2", "product_titles"]
}
}
}