Search code examples
databasestringfull-text-searchsphinx

In sphinx search, how do I match by a filled string?


Hopefully this is a simple one.

I am trying to do a search for all records where the firstname is john and the lastname is not an empty string (''). In regular SQL, this would look like...

select id, firstname, lastname from users where firstname = 'john' and lastname != '';

With sphinx's extended query syntax, from what I understand in their documentation, it should look like this.

select id, firstname, lastname from users where match('@firstname john @lastname !\'\'');

However, with the above query, I still get blank last names.

+---------+------------------+----------+
| id      | firstname        | lastname |
+---------+------------------+----------+
|  110809 | John             |          |
|  313681 | John             |          |
|  520045 | John             |          |
|  554136 | John             |          |

If I try this query:

select id, firstname, lastname from users where match('@firstname john')

I get exactly the same results as above, making me believe the lastname clause is not doing a thing.

Has anyone had to do this with sphinxsearch before? Any pointers or help would be appreciated.


Solution

  • Use index_field_lengths = 1 in your index (plan or RT) config. After that you should automaticaly have an attribute <field_name>_len you can use to filter out (or find) documents with empty field contents, e.g.

    mysql> desc table;
    +----------+------------+
    | Field    | Type       |
    +----------+------------+
    | id       | bigint     |
    | name     | field      |
    | a        | string     |
    | name_len | tokencount |
    +----------+------------+
    4 rows in set (0.00 sec)
    
    mysql> insert into table values(1,'abc', 'abc');
    Query OK, 1 row affected (0.00 sec)
    
    mysql> insert into table values(2,'', '');
    Query OK, 1 row affected (0.00 sec)
    
    mysql> select * from table where name_len != 0;
    +------+------+----------+
    | id   | a    | name_len |
    +------+------+----------+
    |    1 | abc  | 1        |
    +------+------+----------+
    1 row in set (0.00 sec)
    

    index_field_lengths requires reindexation for a plain index or re-creation for an RT index.