I have implemented the function score attribute in my document model which contains a click field that keeps tracks of a number of view per document. Now I want the search results to get more priority and appear at the top based on the clicks per search
My document.rb code
require 'elasticsearch/model'
def self.search(query)
__elasticsearch__.search(
{
query: {
function_score: {
query: {
multi_match: {
query: query,
fields: ['name', 'service'],
fuzziness: "AUTO"
}
},
field_value_factor: {
field: 'clicks',
modifier: 'log1p',
factor: 2
}
}
}
}
)
end
settings index: { "number_of_shards": 1,
analysis: {
analyzer: {
edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter:
["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
}
},
filter: { ascii_folding: { type: 'asciifolding', preserve_original: true
},
edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
"20" }
}
} do
mapping do
indexes :name, type: "string", analyzer: "edge_ngram_analyzer",
term_vector: "with_positions"
indexes :service, type: "string", analyzer: "edge_ngram_analyzer",
term_vector: "with_positions"
end
end
end
Search View is here
<h1>Document Search</h1>
<%= form_for search_path, method: :get do |f| %>
<p>
<%= f.label "Search for" %>
<%= text_field_tag :query, params[:query] %>
<%= submit_tag "Go", name: nil %>
</p>
<% end %>
<% if @documents %>
<ul class="search_results">
<% @documents.each do |document| %>
<li>
<h3>
<%= link_to document.name, controller: "documents", action: "show",
id: document._id %>
</h3>
</li>
<% end %>
</ul>
<% else %>
<p>Your search did not match any documents.</p>
<% end %>
<br/>
When I search for Estamp, I get the results follow in the following order:
Franking and Estamp # clicks 5
Notary and Estamp #clicks 8
So clearly when the Notary and Estamp had more clicks it does not come to the top of the search.How can I achieve this?
This is what I get when I run it on the console.
POST _search
"hits": {
"total": 2,
"max_score": 1.322861,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "13",
"_score": 1.322861,
"_source": {
"id": 13,
"name": "Franking and Estamp",
"service": "Estamp",
"user_id": 1,
"clicks": 7
},
{
"_index": "documents",
"_type": "document",
"_id": "14",
"_score": 0.29015404,
"_source": {
"id": 14,
"name": "Notary and Estamp",
"service": "Notary",
"user_id": 1,
"clicks": 12
}
}
]
Here the score of the documents is not getting updated based on the clicks
Without seeing your indexed data it's not easy to answer. But looking at the query one thing comes to my mind, I'll show it with short example:
I've indexed following documents:
{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"Notary and Estamp", "service" :"text", "clicks": 8}
Running the same query you provided gave this result:
"hits": {
"total": 2,
"max_score": 4.333119,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2iwkems7jEvHyvnccV",
"_score": 4.333119,
"_source": {
"name": "Notary and Estamp",
"service": "text",
"clicks": 8
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2iwo6ds7jEvHyvnccW",
"_score": 3.6673431,
"_source": {
"name": "Franking and Estampy",
"service": "text",
"clicks": 5
}
}
]
}
So everything is fine - document with 8 clicks got higher scoring (_score
field value) and the order is correct.
I noticed in your query that name
field is boosted with high factor. So what would happen if I had following data indexed?
{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"text", "service" :"Notary and Estamp", "clicks": 8}
And result:
"hits": {
"total": 2,
"max_score": 13.647502,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2iwo6ds7jEvHyvnccW",
"_score": 13.647502,
"_source": {
"name": "Franking and Estampy",
"service": "text",
"clicks": 5
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2iwkems7jEvHyvnccV",
"_score": 1.5597181,
"_source": {
"name": "text",
"service": "Notary and Estamp",
"clicks": 8
}
}
]
}
Although Franking and Estampy
has only 5 clicks, it has much much higher scoring than the second document with greater number of clicks.
So the point is that in your query, the number of clicks is not the only factor that has an impact on scoring and final order of documents. Without the real data it's only a guess from my side. You can run the query yourself with some REST client and check scoring/field/matching phrases.
Based on your search result - you can see that document with id=13
has Estamp
term in both fields (name
and service
). That is the reason why this document got higer scoring (it means that in the algorithm of calculating scoring it is more important to have the term in both fields than have higher number of clicks). If you want clicks
field to have bigger impact on the scoring, try to experiment with factor
(probably should be higher) and modifier
("modifier": "square"
could work in your case). You can check possible values here.
Try for example this combination:
{
"query": {
"function_score": {
... // same as before
},
"field_value_factor": {
"field": "clicks" ,
"modifier": "square",
"factor": 3
}
}
}
}
If the only parameter that should have an impact on scoring should be the value in clicks
field, you can try to use "boost_mode": "replace"
- in this case only function score is used, the query score is ignored. So the frequency of Estamp
term in name
and service
fields will have no impact on the scoring. Try this query:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "Estamp",
"fields": [ "name", "service"],
"fuzziness": "AUTO"
}
},
"field_value_factor": {
"field": "clicks",
"factor": 1
},
"boost_mode": "replace"
}
}
}
It gave me:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 5,
"hits": [
{
"_index": "script",
"_type": "test",
"_id": "AV2nI0HkJPYn0YKQxRvd",
"_score": 5,
"_source": {
"name": "Notary and Estamp",
"service": "Notary",
"clicks": 5
}
},
{
"_index": "script",
"_type": "test",
"_id": "AV2nIwKvJPYn0YKQxRvc",
"_score": 4,
"_source": {
"name": "Franking and Estamp",
"service": "Estamp",
"clicks": 4
}
}
]
}
}
This may be the one you are looking for (note the values "_score": 5
and "_score": 4
are matching the number of clicks).