Search code examples
sphinxproximity

Force Proximity Search into multiple word wordform?


I use Proximity to good use with Sphinx e.g. Twain NEAR/1 Mark will return

Mark Twain

and

Twain, Mark

But say I had a word form like:

Weekday > Week Day

How could I set any given search to use Proximity NEAR/3 (or NEAR/X) so it would find

Week Day

and

Day of Week

I get in this case there are other ways to skin the cat but in general, looking for a way that the multiple word map doe not get pushed as 'Word1 Word2' i.e. 'Week Day' because otherwise I get docs such as

'I worked for one entire day before realizing it was going to take a

full week'


Solution

  • There's no easy way out of the box. You can perhaps make a change in your app so it does changes each 'word' to "word"~N in your search query or even better do that only for the same wordforms that Sphinx deals with. Here's an example:

    mysql> select *, weight() from idx_min where match('weekday');
    +------+-------------------------------------------------------------------------------+------+----------+
    | id   | doc                                                                           | a    | weight() |
    +------+-------------------------------------------------------------------------------+------+----------+
    |    1 | Weekday                                                                       |    1 |     2319 |
    |    2 | day of week                                                                   |    2 |     1319 |
    |    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     1319 |
    +------+-------------------------------------------------------------------------------+------+----------+
    3 rows in set (0.00 sec)
    
    mysql> select *, weight() from idx_min where match('"weekday"');
    +------+---------+------+----------+
    | id   | doc     | a    | weight() |
    +------+---------+------+----------+
    |    1 | Weekday |    1 |     2319 |
    +------+---------+------+----------+
    1 row in set (0.00 sec)
    
    mysql> select *, weight() from idx_min where match('"weekday"~2');
    +------+-------------+------+----------+
    | id   | doc         | a    | weight() |
    +------+-------------+------+----------+
    |    1 | Weekday     |    1 |     2319 |
    |    2 | day of week |    2 |     1319 |
    +------+-------------+------+----------+
    2 rows in set (0.00 sec)
    
    mysql> select *, weight() from idx_min where match('"entire"~2 "day"~2');
    +------+-------------------------------------------------------------------------------+------+----------+
    | id   | doc                                                                           | a    | weight() |
    +------+-------------------------------------------------------------------------------+------+----------+
    |    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     1500 |
    +------+-------------------------------------------------------------------------------+------+----------+
    1 row in set (0.00 sec)
    
    mysql> select *, weight() from idx_min where match('weekday full week');
    +------+-------------------------------------------------------------------------------+------+----------+
    | id   | doc                                                                           | a    | weight() |
    +------+-------------------------------------------------------------------------------+------+----------+
    |    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     2439 |
    +------+-------------------------------------------------------------------------------+------+----------+
    1 row in set (0.01 sec)
    
    mysql> select *, weight() from idx_min where match('"weekday"~2 full week');
    Empty set (0.00 sec)
    

    The last one would be the best way to go, but you would have to:

    1) parse your query. E.g. like this:

    mysql> call keywords('weekday full week', 'idx_min');
    +------+-----------+------------+
    | qpos | tokenized | normalized |
    +------+-----------+------------+
    | 1    | weekday   | week       |
    | 2    | weekday   | day        |
    | 3    | full      | full       |
    | 4    | week      | week       |
    +------+-----------+------------+
    4 rows in set (0.00 sec)
    

    and if you see that for the same tokenized word you get 2 different normalized words that can be a signal for your app to wrap the tokenized word into "word"~N.

    2) run the query. In this case "weekday"~2 full week