Search code examples
internationalizationlocalepython-sphinxread-the-docs

How to Internationalize Sphinx (i18n)


From the defaults following an execution of sphinx-quickstart, how do I get language-specific directories like what's used by Read The Docs?

I'm starting a new Sphinx documentation site that's hosted by GitHub Pages, built with GitHub Actions, and using the Read The Docs theme. Basically I'm trying to re-create Read The Docs but without ads or server-side search.

I won't be able to actually translate anything for a long time, but I do want to make sure that my new project is ready to be translated at a later time. Most specifically, I want to make sure that permalinks to my documentation will not change after I add translations. To that end, I'd like to make my documentation URL include the language in the URL (/en/stable/), for example:

I followed the Sphinx Guide to Internationalization and set the language and locale_dirs variables in conf.py:

language = 'en'
locale_dirs = ['locale/']
gettext_compact = True

Unfortunately, after the changes above, make html and make -e SPHINXOPTS="-D language='en'" html still produce files without the en subdirectory

user@host:~/docs$ tree -L 2 _build
_build
├── doctrees
│   ├── environment.pickle
│   └── index.doctree
└── html
    ├── genindex.html
    ├── index.html
    ├── objects.inv
    ├── search.html
    ├── searchindex.js
    ├── _sources
    └── _static

Am I missing something or is the documentation missing something? How do I setup a fresh Sphinx install with the defaults to build to language-specific html with make?


Solution

  • The undocumented truth is that sphinx doesn't do this. In fact, sphinx i18n functionality stops-short of a workflow for building all your language-specific directories. It's implicit that you have to design & wrap sphinx-build with your own scripts to set this up.

    To translate your sphinx reST files to another language, you don't have to update your '.rst' files themselves. Sphinx already understands what a block of text looks like, and it can divide-up heading strings, captions, double-newline-separated paragraphs, etc into unique "source strings" (msgid), and put them into '.pot' source-language files and '.po' destination-language files.

    First, run make gettext from the 'docs/' directory. This tells sphinx to parse your reST files and automatically find a bunch of strings-to-be-translated and give them a unique msgid.

    user@host:~/rtd-github-pages$ cd docs/
    user@host:~/rtd-github-pages/docs$ ls
    autodoc.rst  buildDocs.sh  conf.py.orig  locales   Makefile
    _build       conf.py       index.rst     make.bat  _static
    user@host:~/rtd-github-pages/docs$ 
     
    user@host:~/rtd-github-pages/docs$ make gettext
    Running Sphinx v1.8.4
    making output directory...
    building [gettext]: targets for 0 template files
    building [gettext]: targets for 2 source files that are out of date
    updating environment: 2 added, 0 changed, 0 removed
    Hello Worldrces... [ 50%] autodoc                                               
    reading sources... [100%] index                                                 
    looking for now-outdated files... none found
    pickling environment... done
    checking consistency... done
    preparing documents... done
    writing output... [100%] index                                                  
    writing message catalogs... [100%] index                                        
    build succeeded.
     
    The message catalogs are in _build/gettext.
    user@host:~/rtd-github-pages/docs$ 
    

    The above execution should create the following files

    user@host:~/rtd-github-pages/docs$ ls _build/gettext/
    autodoc.pot  index.pot
    user@host:~/rtd-github-pages/docs$ 
    

    Here's a snippet from '_build/gettext/index.pot' showing two strings on our documentation's main page that we'll translate from English to Spanish.

    user@host:~/rtd-github-pages/docs$ grep -m2 -A2 .rst _build/gettext/index.pot 
    #: ../../index.rst:7
    msgid "Welcome to helloWorld's documentation!"
    msgstr ""
    --
    #: ../../index.rst:9
    msgid "Contents:"
    msgstr ""
    user@host:~/rtd-github-pages/docs$ 
    

    Next, let's tell sphinx to prepare some Spanish destination-language '.po' files from our above-generated source-lananguage '.pot' files.

    Before proceeding with this step, you'll need to install sphinx-intl and the python Stemmer module. If you're using a Debian-based distro, you can do so with the following command.

    sudo apt-get install -y sphinx-intl python3-stemmer
    

    Execute the following command to prepare our Spanish-specific translation files.

    user@host:~/rtd-github-pages/docs$ sphinx-intl update -p _build/gettext -l es
    Create: locales/es/LC_MESSAGES/index.po
    Create: locales/es/LC_MESSAGES/autodoc.po
    user@host:~/rtd-github-pages/docs$ 
    

    The above execution created two '.po' files: one for each of our '.pot' source-language files, which correlate directly to each of our two '.rst' files (index.rst and autodoc.rst). Perfect.

    If we grep the new Spanish-specific 'docs/locales/es/LC_MESSAGES/index.po' file, we see it has the same contents as the source '.pot' file.

    user@host:~/rtd-github-pages/docs$ grep -m2 -A2 .rst locales/es/LC_MESSAGES/index.po 
    #: ../../index.rst:7
    msgid "Welcome to helloWorld's documentation!"
    msgstr ""
    --
    #: ../../index.rst:9
    msgid "Contents:"
    msgstr ""
    user@host:~/rtd-github-pages/docs$ 
    

    These language-specific '.po' files are where we actually do the translating. If you're a large project, then you'd probably want to use a special program or service to translate these files. But, for clarity, we'll just edit the files directly.

    user@host:~/rtd-github-pages/docs$ perl -pi -0e "s^(msgid \"Welcome to helloWorld's documentation\!\"\n)msgstr \"\"^\1msgstr \"¡Bienvenido a la documentación de helloWorld\!\"^" locales/es/LC_MESSAGES/index.po
    user@host:~/rtd-github-pages/docs$ perl -pi -0e "s^(msgid \"Contents:\"\n)msgstr \"\"^\1msgstr \"Contenidos:\"^" locales/es/LC_MESSAGES/index.po
    user@host:~/rtd-github-pages/docs$ 
     
    user@host:~/rtd-github-pages/docs$ grep -m2 -A2 .rst locales/es/LC_MESSAGES/index.po 
    #: ../../index.rst:7
    msgid "Welcome to helloWorld's documentation!"
    msgstr "¡Bienvenido a la documentación de helloWorld!"
    --
    #: ../../index.rst:9
    msgid "Contents:"
    msgstr "Contenidos"
    user@host:~/rtd-github-pages/docs$ 
    

    As you can see, the above execution filled-in the contents of msgstr "" with the Spanish translation of the corresponding msgid line above it in the original (English) language.

    Now let's build two versions of our html static content: [1] in English and [2] in Spanish.

    user@host:~/rtd-github-pages/docs$ sphinx-build -b html . _build/html/en -D language='en'
    Running Sphinx v1.8.4
    loading translations [en]... done
    making output directory...
    building [mo]: targets for 0 po files that are out of date
    building : targets for 2 source files that are out of date
    updating environment: 2 added, 0 changed, 0 removed
    Hello Worldrces... [ 50%] autodoc                                               
    reading sources... [100%] index                                                 
    looking for now-outdated files... none found
    pickling environment... done
    checking consistency... done
    preparing documents... done
    writing output... [100%] index                                                  
    generating indices... genindex py-modindex
    highlighting module code... [100%] helloWorld                                   
    writing additional pages... search
    copying static files... done
    copying extra files... done
    dumping search index in English (code: en) ... done
    dumping object inventory... done
    build succeeded.
     
    The HTML pages are in _build/html/en.
    user@host:~/rtd-github-pages/docs$
     
    user@host:~/rtd-github-pages/docs$ sphinx-build -b html . _build/html/es -D language='es'
    Running Sphinx v1.8.4
    loading translations [es]... done
    making output directory...
    building [mo]: targets for 1 po files that are out of date
    writing output... [100%] locales/es/LC_MESSAGES/index.mo                        
    building : targets for 2 source files that are out of date
    updating environment: 2 added, 0 changed, 0 removed
    Hello Worldrces... [ 50%] autodoc                                               
    reading sources... [100%] index                                                 
    looking for now-outdated files... none found
    pickling environment... done
    checking consistency... done
    preparing documents... done
    writing output... [100%] index                                                  
    generating indices... genindex py-modindex
    highlighting module code... [100%] helloWorld                                   
    writing additional pages... search
    copying static files... done
    copying extra files... done
    dumping search index in Spanish (code: es) ... done
    dumping object inventory... done
    build succeeded.
     
    The HTML pages are in _build/html/es.
    user@host:~/rtd-github-pages/docs$
     
    user@host:~/rtd-github-pages/docs$ firefox _build/html/en/index.html _build/html/es/index.html &
    [1] 12134
    user@host:~/rtd-github-pages/docs$ 
    

    The firefox command in the above execution should open your browser with two tabs: (1) in English and (2) in Spanish.

    For more information, see this article on how to setup internationalization in Sphinx.