I'm having an issue with specific entries in my wordforms file that are not being interpreted as expected.
Here are a couple of examples:
1/48 > forty-eighth
1/96 > ninety-sixth
As you can see, these entries contain both slashes and hyphens, which may be related to my issue.
For some reason, Sphinx doesn't correctly equate each fraction to the spelled out version. Search results for "1/48" are not the same as for "forty-eighth", as they should be. In other words, the mapping between these equivalent forms is not working.
In my Sphinx config, I have the forward slash (/) set as a blend character, so I assume that the fraction is being recognized properly.
In support of that belief, the following wordforms entry does work correctly:
1/4 > fourth
Does anyone have any idea why my multi-term synonyms would not be working as expected?
I have tried replacing the hyphen with a space, but this doesn't change the result at all. Would it help to change the order of the terms (i.e., on which side of the ">" they should be placed)?
Thank you very much for any help.
When using characters in Sphinx it is always good to keep in mind the following:
By default, the Sphinx tokenizer handles unknown characters as whitespace https://sphinxsearch.com/blog/2014/11/26/sphinx-text-processing-pipeline/
That has given me weird results too when using wordforms.
I would suggest you add the hyphen to charset_tables
so ninety-sixth
becomes one word. ignore_chars is also an option but then you will be looking for ninetysixth
instead.
Much depends on the rest of your dataset and use cases ofcourse.