Search code examples
phpsphinx

Sphinx - Breaks utf8 character to space


I have a string seule la présentation.

When I do a phrase search "pr", sphinx matches this string but it should not as there is no word pr present in it.

But when searched for "pre", it does not match.

The problem seems to be with this utf8 character é. Sphinx ignores this character while indexing and treats string before this character as a word.

Here is the sample Sphinx query with match mode SPH_MATCH_EXTENDED :

@name: "pr"

Is there any workaround for this?


Solution

  • Not an expert on this, but know with sphinx you have to explicitly list what charactors are considered part of 'words' (everything else is considered seperators), via charset_table

    http://sphinxsearch.com/docs/current/conf-charset-table.html

    So you would need to include these charactor(s) in charset_table for them to be indexable (with or without 'folding' to non-dialect chars)

    This a wiki page http://sphinxsearch.com/wiki/doku.php?id=charset_tables that lists some stuff, you may be able to copy/paste.