I'm studding sphinx and thinking-sphinx and I need your opinion and help, what I want to do is the following:
I have a list of news (noticias) and I want to order the results by date and relevance because if I search for something doesn't matter when the news was created the query won't take in consideration. If I could specify at least that the closer year or year and month have more relevance my question should already be solved.
I saw a lot of things but not to much conclusive, maybe for my low experience with sphinx and thinking-sphinx.
How can a solve this problem? How you think is the best way? Thanks.
My model:
define_index do
indexes :titulo
indexes :chamada
indexes :texto
indexes :description
indexes :keywords
indexes :otimizador_de_busca
indexes :created_at, :sortable => true
indexes tags.nome, :as => :tag
indexes usuario.nome, :as => :autor
where "validacao = '1'"
end
My search function on controller:
termo = params[:termo].first(50)
@noticias = Noticia.search termo,
:field_weights => {:tag => 150, :autor => 120, :titulo => 100, :chamada => 80, :otimizador_de_busca => 65, :description => 50, :keywords => 50, :texto => 10},
:match_mode => :all,
:page => params[:pagina],
:sort_mode => :extended,
:order => "@relevance DESC, created_at DESC",
:per_page => 15
A few things to note. Firstly, there's a difference between fields and attributes with Sphinx, and there's not really much to be gained by having created_at as a field, but it's far more useful as an attribute (which are natively sortable). So, let's update the index definition:
define_index do
indexes :titulo
indexes :chamada
indexes :texto
indexes :description
indexes :keywords
indexes :otimizador_de_busca
indexes tags.nome, :as => :tag
indexes usuario.nome, :as => :autor
has :created_at
where "validacao = '1'"
end
And then run rake ts:rebuild
so that change is reflected in your index files and the Sphinx daemon is aware of it too.
As for how you're sorting... you've got a few options. In your example, you're sorting primarily by relevance, but anything with matching relevance scores has the newer items listed first. I think that'll work quite well.
If you want to use Sphinx's time_segments sorting, then that might also work well, as it'll group results first by their age (without being too specific), and then automatically orders within each age group by relevance:
termo = params[:termo].first(50)
@noticias = Noticia.search termo,
:field_weights => {:tag => 150, :autor => 120, :titulo => 100, :chamada => 80, :otimizador_de_busca => 65, :description => 50, :keywords => 50, :texto => 10},
:match_mode => :extended,
:page => params[:pagina],
:sort_mode => :time_segments,
:order => :created_at,
:per_page => 15
I've also changed the match mode to extended, which I'd generally recommend.
Finally, as you've suggested, you could factor in the created_at timestamp with the relevance in an expression - that's up to you. There's probably formulas out there that could help with that, but I think that's extra complexity you probably don't need.
If you think that it's more important to have newer results first, then use time segments. If you think that it's more important to have relevant results to the search query first, use the extended sort mode in your own example. I think that one is better, but it's up to you.