Search code examples
phpsearchsphinxrankinginfix-notation

Sphinx search for exact match and then infix matches


I am using Sphinx to provide search to a website and I've run across a bit of a snag when returning relevant results.

To keep my question simple, let's assume that I have two fields, @title and @body, which are weighted 100 & 15 respectively. When I search for small words like the word 'in' I would like to have it rank exact matches for that search term higher and then check for matches to 'in*|*in|*in*' and rank them slightly lower. Is there any way to have this type of specificity for your searches?

Example results for 'in':

  1. Indian Food
  2. In The Middle
  3. Document about Latin

Some relevant settings are:

In sphinx.conf:

morphology              = stem_en
charset_type            = utf-8
min_word_len            = 2
min_prefix_len          = 0
min_infix_len           = 2
enable_star             = 1

In search.php

$sp->SetMatchMode( SPH_MATCH_EXTENDED2 );
$sp->SetRankingMode( SPH_RANK_PROXIMITY_BM25 );
$sp->SetFieldWeights ( array('title' => 100, 'body' => 15) );

Also, as a side note: I've also had some instances where partial matches don't even show up in the search results. For example, I have searched for Cow but Cowboy does not show up as a result. I have also searched for Cowb and Cowbo and it wasn't until I typed Cowboy that I received the expected result. Any thoughts?


This question is along the same lines as this previous SO question, but I hope I've given a little more detail as to my problem and the things I've tried to warrant a solution.


Solution

  • Looks like morphologically Cow not related to Cowboy.

    You could solve it in two ways:

    1. Use wordforms file with Cow > Cowboy
    2. As star is enabled you could change query from "Cow" to "Cow*" which will find all words starting with "Cow".

    Regard different ranking for "in" and "in" I could suggest to have two body fields in index, lets say: body and body_star with the same content from body field.

    in search.php

    $sp->SetRankingMode( SPH_RANK_PROXIMITY_BM25 );
    $sp->SetMatchingMode( SPH_MATCH_EXTENDED2 );
    $sp->SetFieldWeights ( array('title' => 20, 'body' => 15, 'body_start' => 5) );
    $sp->Query("@body in @body_star *in* @title in");
    

    This should do the trick.