Search code examples
pythonpython-sphinxspell-checking

Sphinx - No tokenizer found for language fr, cs etc


I'm using Sphinx for documentation purpose. I want to use a spell-check handling French.

So far, I have done the following:

  • installing sphinx spellcheck extension
sudo pip install sphinxcontrib-spelling
  • installing French language
 sudo apt-get install myspell-fr-fr
  • add extension in conf.py
 extensions = ["sphinxcontrib.spelling"]
 spelling_lang='fr'
  • add spelling builder

builder = ["html", "pdf", "spelling" ],

Here is the traceback I get while running Sphinx :

Exception occurred:
  File "/usr/lib/python2.7/dist-packages/sphinx/cmdline.py", line 188, in main
warningiserror, tags)
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 134, in __init__
self._init_builder(buildername)
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 194, in _init_builder
self.builder = builderclass(self)
  File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 57, in __init__
self.init()
  File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line 253, in init
filters=filters,
  File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line 181, in __init__
self.tokenizer = get_tokenizer(lang, filters)
  File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py", line 186, in get_tokenizer
raise TokenizerNotFoundError(msg)
TokenizerNotFoundError: No tokenizer found for language 'fr'

Solution

  • You have to add tokenizer for given language into PyEnchant.

    Quick and dirty solution

    Clone the pyenchant repo and cd into it:

    $ git clone [email protected]:rfk/pyenchant.git
    $ cd pyenchant
    

    Go to directory, where are tokenizers defined:

    $ cd enchant/tokenize
    

    Copy the existing en.py tokenizer to language code you want to use (I was missing cs, you can try fr):

    $ cp en.py cs.py
    $ cp en.py fr.py
    

    Install the package from this modified code:

    $ cd ../..  # first return to the dir with `setup.py`
    $ pip install -e .
    

    And now it shall work (it works for me).

    Better solution would be to review the copied tokenizer and modify where it does not fit your language. And contribute to pyenchant.