Search code examples
latexpython-sphinxmultilingual

Multi-language LaTeX document with dozens of languages


I'm a Technical Writer trying to output a Python-Sphinx website into a .pdf via LaTeX. The manual has a safety regulations and environmental compliance section with about 40+ languages in it. These languages all appear as-is in the base file - and .rst files have the same unicode support as .txt, so if Bulgarian renders appropriately in Cyrillic in the base file I'm assuming it's encoded correctly.

I already know to use either LuaLaTeX or XeLaTeX to render unicode properly, and I've already found that TeX files compiled from Sphinx/.rst render better under LuaLaTeX. Even so, under LuaLaTeX, the Greek and Cyrillic don't render at all (nor do accented letters, but for some reason Germanic eth/ð does render).

Everything I've seen on multi-language support involves one of several packages that require you to bracket each section with something like \begin{Russian}, but for all 40+ languages. With the base file being in a different format and the .tex file being generated automatically, every time I update the manual it would save over all the work I've done.

The best solution for me would be to put all the multi-language support in the header, and just say "hey dumb dumb... just render the unicode text as-is". As it is, the auto-generated frontspiece and ToC is unsatisfactory, so I'm keeping the header saved in a separate document and I'm pasting the better header in. Front-loading multi-language support by defining everything in the header is definitely the most ideal solution.

Any help would be good.

The following is the header provided by Python-Sphinx, with minor adjustments:

%% Generated by Sphinx.
\def\sphinxdocclass{report}
\documentclass[letterpaper,10pt,english]{sphinxmanual}

\ifdefined\pdfpxdimen
   \let\sphinxpxdimen\pdfpxdimen\else\newdimen\sphinxpxdimen
\fi \sphinxpxdimen=.75bp\relax
\ifdefined\pdfimageresolution
    \pdfimageresolution= \numexpr \dimexpr1in\relax/\sphinxpxdimen\relax
\fi
%% let collapsible pdf bookmarks panel have high depth per default
\PassOptionsToPackage{bookmarksdepth=5}{hyperref}

\PassOptionsToPackage{warn}{textcomp}
\usepackage[utf8]{inputenc}
\ifdefined\DeclareUnicodeCharacter
% support both utf8 and utf8x syntaxes
  \ifdefined\DeclareUnicodeCharacterAsOptional
    \def\sphinxDUC#1{\DeclareUnicodeCharacter{"#1}}
  \else
    \let\sphinxDUC\DeclareUnicodeCharacter
  \fi
  \sphinxDUC{00A0}{\nobreakspace}
  \sphinxDUC{2500}{\sphinxunichar{2500}}
  \sphinxDUC{2502}{\sphinxunichar{2502}}
  \sphinxDUC{2514}{\sphinxunichar{2514}}
  \sphinxDUC{251C}{\sphinxunichar{251C}}
  \sphinxDUC{2572}{\textbackslash}
\fi

\usepackage{cmap}
\usepackage[T1]{fontenc}
\usepackage{amsmath,amssymb,amstext}
\usepackage{babel}

\usepackage{tgtermes}
\usepackage{tgheros}
\renewcommand{\ttdefault}{txtt}

\usepackage[Bjarne]{fncychap}
\usepackage{sphinx}

\fvset{fontsize=auto}
\usepackage{geometry}

% Include hyperref last.
\usepackage{hyperref}

% Fix anchor placement for figures with captions.
\usepackage{hypcap}% it must be loaded after hyperref.

% Set up styles of URL: it should be placed after hyperref.
\urlstyle{same}

\usepackage{sphinxmessages}

\title{...}
\date{\today}
\release{...}
\author{...}

\makeindex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}

The document is almost entirely in English except for one dang section near but not at the end:

- Това е българско
- Αυτό είναι ελληνικό
- Tohle je česky
- Bu türkçe
- Þetta er íslenskt

\end{document}


Solution

  • Caveat: This won't give correct hyphenation and other special language settings (e.g. French spacing for punctuation marks), but it will show the text. If you want these other features as well, you will have to deal with babel or polyglossia.


    The unicode capabilities of xe- and lualatex only fully unfold if you also use a font which does have a good coverage of symbols.

    For example with the Noto Serif font:

    % !TeX TS-program = lualatex
    %% Generated by Sphinx.
    \def\sphinxdocclass{report}
    \documentclass[letterpaper,10pt,english]{sphinxmanual}
    
    \ifdefined\pdfpxdimen
       \let\sphinxpxdimen\pdfpxdimen\else\newdimen\sphinxpxdimen
    \fi \sphinxpxdimen=.75bp\relax
    \ifdefined\pdfimageresolution
        \pdfimageresolution= \numexpr \dimexpr1in\relax/\sphinxpxdimen\relax
    \fi
    %% let collapsible pdf bookmarks panel have high depth per default
    \PassOptionsToPackage{bookmarksdepth=5}{hyperref}
    
    \PassOptionsToPackage{warn}{textcomp}
    \usepackage[utf8]{inputenc}
    \ifdefined\DeclareUnicodeCharacter
    % support both utf8 and utf8x syntaxes
      \ifdefined\DeclareUnicodeCharacterAsOptional
        \def\sphinxDUC#1{\DeclareUnicodeCharacter{"#1}}
      \else
        \let\sphinxDUC\DeclareUnicodeCharacter
      \fi
      \sphinxDUC{00A0}{\nobreakspace}
      \sphinxDUC{2500}{\sphinxunichar{2500}}
      \sphinxDUC{2502}{\sphinxunichar{2502}}
      \sphinxDUC{2514}{\sphinxunichar{2514}}
      \sphinxDUC{251C}{\sphinxunichar{251C}}
      \sphinxDUC{2572}{\textbackslash}
    \fi
    
    \usepackage{cmap}
    \usepackage[T1]{fontenc}
    \usepackage{amsmath,amssymb,amstext}
    \usepackage{babel}
    
    \usepackage{tgtermes}
    \usepackage{tgheros}
    \renewcommand{\ttdefault}{txtt}
    
    \usepackage[Bjarne]{fncychap}
    \usepackage{sphinx}
    
    \fvset{fontsize=auto}
    \usepackage{geometry}
    
    % Include hyperref last.
    \usepackage{hyperref}
    
    % Fix anchor placement for figures with captions.
    \usepackage{hypcap}% it must be loaded after hyperref.
    
    % Set up styles of URL: it should be placed after hyperref.
    \urlstyle{same}
    
    \usepackage{sphinxmessages}
    
    \title{...}
    \date{\today}
    \release{...}
    \author{...}
    
    \makeindex
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    \usepackage{fontspec}
    \setmainfont{Noto Serif}
    
    \begin{document}
    
    The document is almost entirely in English except for one dang section near but not at the end:
    
    - Това е българско
    
    - Αυτό είναι ελληνικό
    
    - Tohle je česky
    
    - Bu türkçe
    
    - Þetta er íslenskt
    
    
    \end{document}
    

    enter image description here

    (to see which fonts on your computer support the characters you want to use, you can use the command line tool albatross, see e.g. https://stackoverflow.com/a/69721465/2777074)