Search code examples
phpfull-text-searchsphinx

Sphinx - Duplicated results when searching


There's a way to avoid that when searching in Sphinx, there are results with id duplicated because are in both indexes (main and delta)? I know that I can resolve this running a merge of both indexes, but I want to know if there is another way to avoid the merge because could be expensive for the server run it every time.


Solution

  • 1) Just run your query against the both indexes at once by making a distributed index with them as locals or agents or just using a comma in your search query, e.g.:

    mysql> select * from idx_min;
    +------+--------------------------------------------------------------+------+
    | id   | doc                                                          | a    |
    +------+--------------------------------------------------------------+------+
    |    1 | dog cat parrot juice apple mandarine juice juice apple juice |  123 |
    |    2 | dog cat juice apple apple juice                              |  123 |
    +------+--------------------------------------------------------------+------+
    2 rows in set (0.01 sec)
    
    mysql> select * from idx_min2;
    +------+--------------------------------------------------------------+------+
    | id   | doc                                                          | a    |
    +------+--------------------------------------------------------------+------+
    |    1 | dog cat parrot juice apple mandarine juice juice apple juice |  123 |
    |    2 | dog cat juice apple apple juice                              |  123 |
    +------+--------------------------------------------------------------+------+
    2 rows in set (0.00 sec)
    

    I.e. we can see that the both indexes have docs with ids 1 and 2. But:

    mysql> select * from idx_min, idx_min2;
    +------+--------------------------------------------------------------+------+
    | id   | doc                                                          | a    |
    +------+--------------------------------------------------------------+------+
    |    1 | dog cat parrot juice apple mandarine juice juice apple juice |  123 |
    |    2 | dog cat juice apple apple juice                              |  123 |
    +------+--------------------------------------------------------------+------+
    2 rows in set (0.00 sec)
    

    gives us the documents with duplicates removed.

    2) To make the way of de-duplication more controlled you can use kill-lists. Kill-list is a list of IDs assigned to an index which says that these ids should be removed from any preceding indexes. Depending on the version you're using (Sphinx 2 / Manticore / Sphinx 3) the commands to define a kill-list and the behaviour may vary.