Search code examples
sphinxnon-ascii-characters

What parameter should I use for non-english's charset_table in query "select ... where match('parameter')"?


For supporting Chinese or CJK, I used a including CJK characters' charset_table in sphinx.conf, after indexing and starting searchd, I used a client mysql to connect it:

mysql -h 0 -P 9306 

and I can query and get all records using the following command:

mysql> select * from excursion_core;

But if I append where match I get nothing:

mysql> select * from excursion_core where match('kike');
Empty set (0.00 sec)

For verifying whether such a new charset_table impact this, I used the old sphinx.conf only for english and run the same command:

mysql> select * from excursion_core where match('kike');

I can get all the records matching the string 'kike'.

I guess I should use a different string for CJK's charset, but I don't know what string I should use? Any advice will be welcome!


Solution

  • According my another similar question: How to enable ActiveRecord to support CJK query?, I found the answer for this. I got the following information after running locale:

    LANG=
    LC_COLLATE="C"
    LC_CTYPE="UTF-8"
    LC_MESSAGES="C"
    LC_MONETARY="C"
    LC_NUMERIC="C"
    LC_TIME="C"
    LC_ALL=
    

    The "C" is not for UTF-8, so I executed the command:

    export LANG=en_US.UTF-8
    

    It changed the locale's result:

    LANG="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_CTYPE="UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_ALL=
    

    Then I can get the right records when I run the question's query in question:

    mysql> select * from excursion_core where match('kike');