Search code examples
markdownpandoccjkxelatex

pandoc does not recognize Chinese characters


I want to compile, with pandoc, a Markdown document containing CJK elements (Chinese, actually). It was stated there that --latex-engine=xelatex option allows pandoc to compile Unicode characters.

However, I tried
cjk.md:

Hello
你好

compiled with (in bash)

pandoc -s -o cjk.pdf --latex-engine=xelatex cjk.md

But the resulting .pdf has only Hello shown, while 你好 was missing. Have I missed something? pandoc is updated; I'm using Macbook Air (bought 2012), updated to Sierra. I have properly installed xelatex (in MacTex I suppose), since when I used texstudio to compile xelatex, there was no problem.


Solution

  • Solved. Tl;dr: it's not enough to set compiler to be xelatex instead of pdflatex; one has to include package xeCJK as well, but for where it should be, see below.


    Edit: the below can also be achieved by setting these pandoc template variables:

    ---
    CJKmainfont: STSong
    CJKoptions:
      - BoldFont=STHeiti
      - ItalicFont=STKaiti
    ---
    
    Hello 你好
    

    In my case, pandoc reads a .md, converts it to be a .tex, and call compiler to compile that to be a .pdf. Thus, in theory, what I can do normally with a tex-like compiler, can be done with pandoc as well --- it is only that I have to specify required template.

    The pandoc calls its own latex template, which we can cat in the terminal by a prewritten command:

    pandoc -D latex > default.latex
    

    This essentially copies a new file default.latex to . (current directory). It is this I now modify. Append this option to pandoc when compiling:

    --template=my-directory/my-template.latex
    

    In the past I type Chinese characters (or more generally, CJK), I use a template beginning with

    \documentclass[12pt]{article}
    \usepackage{xeCJK}% use Latin font whenever possible
    \usepackage{fontspec}% set Chinese fonts, as follows
    \setCJKmainfont[BoldFont=STHeiti,ItalicFont=STKaiti]{STSong}
    \setCJKsansfont[BoldFont=STHeiti]{STXihei}
    \setCJKmonofont{STFangsong}
    % .... whatever xeCJK commands you use
    

    The fonts should be those your system permits; these shown above are shipped with mac.

    But when I simply pasted this into the pandoc-provied template, there were many cryptic error messages

    option clash for package XXXX....
    

    This was because the pandoc-proviede template already defined xeCJK. Indeed, search these lines:

    $if(CJKmainfont)$
        \usepackage{xeCJK}
        \setCJKmainfont[$for(CJKoptions)$$CJKoptions$$sep$,$endfor$]{$CJKmainfont$}
    $endif$
    

    These lines (quoted part) should be replaced by

    \usepackage{xeCJK}
    \setCJKmainfont[BoldFont=STHeiti,ItalicFont=STKaiti]{STSong}
    % .... and so on, whatever you call from xeCJK
    

    i.e., delete if, so that xeCJK is always executed; otherwise, xeCJK line will not be copied to the intermediate .tex file. And also delete \usepackage{fontspec}, because it is called by pandoc by default, otherwise error message occurs for packages are called twice in the intermediate .tex file.


    acknowledgement: [1] [2] [3]

    Sorry I didn't keep track of every websites from which I referenced, but none gets it all right anyway, or are outdated. Of most help is mb21, who suggested in the comment that I output .tex to debug, after which I found xeCJK was not included.

    I have spent some 10+ hrs on this issue, but from now on I can happily type Chinese in a markdown file. I have wrote this down for poor posterity's sake.