I use Proximity to good use with Sphinx e.g. Twain NEAR/1 Mark
will return
Mark Twain
and
Twain, Mark
But say I had a word form like:
Weekday > Week Day
How could I set any given search to use Proximity NEAR/3
(or NEAR/X
) so it would find
Week Day
and
Day of Week
I get in this case there are other ways to skin the cat but in general, looking for a way that the multiple word map doe not get pushed as 'Word1 Word2'
i.e. 'Week Day'
because otherwise I get docs such as
'I worked for one entire day before realizing it was going to take a
full week'
There's no easy way out of the box. You can perhaps make a change in your app so it does changes each 'word' to "word"~N in your search query or even better do that only for the same wordforms that Sphinx deals with. Here's an example:
mysql> select *, weight() from idx_min where match('weekday');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 1 | Weekday | 1 | 2319 |
| 2 | day of week | 2 | 1319 |
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 1319 |
+------+-------------------------------------------------------------------------------+------+----------+
3 rows in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"weekday"');
+------+---------+------+----------+
| id | doc | a | weight() |
+------+---------+------+----------+
| 1 | Weekday | 1 | 2319 |
+------+---------+------+----------+
1 row in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"weekday"~2');
+------+-------------+------+----------+
| id | doc | a | weight() |
+------+-------------+------+----------+
| 1 | Weekday | 1 | 2319 |
| 2 | day of week | 2 | 1319 |
+------+-------------+------+----------+
2 rows in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"entire"~2 "day"~2');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 1500 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.00 sec)
mysql> select *, weight() from idx_min where match('weekday full week');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 2439 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.01 sec)
mysql> select *, weight() from idx_min where match('"weekday"~2 full week');
Empty set (0.00 sec)
The last one would be the best way to go, but you would have to:
1) parse your query. E.g. like this:
mysql> call keywords('weekday full week', 'idx_min');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | weekday | week |
| 2 | weekday | day |
| 3 | full | full |
| 4 | week | week |
+------+-----------+------------+
4 rows in set (0.00 sec)
and if you see that for the same tokenized word you get 2 different normalized words that can be a signal for your app to wrap the tokenized word into "word"~N.
2) run the query. In this case "weekday"~2 full week