Search code examples
sphinx

Sphinx for recognizing Digits


Because I don't want 15.50 to index as 15 50 I made a # of additions to the Exceptions file in my Sphinx Configuration file e.g.

1.50 => 1.50

However that gets quickly out of hand.

I tried doing as a regexp instead e.g.

(([0-9]{1,3}))\.([0-9]{2})=>\1.\2

Yet apparently it is too late to do so with Regexp as the period already was ignored. Ideally I could force this operation to happen at the same stage as Exceptions so that I could handle all permutations vs one by one in exceptions (and it gets totally unwieldy for the occasional #s with 3 or more decimal places such as 32.243.

Can I force this regexp_filter to happen before the . is ignored the way the exceptions do or am I forced to add the . to the Sphinx character set?


Solution

  • Dont think its so much that the period is ignored before, its that its still ignored after the replacement. Exceptions work as exceptions to normal tokenizing rules (so matching words dont go though the rest of the system), which is why work for you. Whereas regex filters, just 'transform' the data before the normal processing, its not bypassed.

    Do look at blend_chars http://sphinxsearch.com/docs/current.html#conf-blend-chars ... maybe period as a blended char would help you.