Search code examples
javapdfunicodecjkwkhtmltopdf

Unicode chars are converted to broken symbols when I use wkhtmltopdf


I have HTML that contains some Unicode characters, and saved in "UTF-8" to disk. I can use less to display it, all characters displayed well:

<h1>什么是Action?</h1>
<p>Play程序接收到的大部分请求,都是由<code>Action</code>来处理的。

But when I use "wkhtmltopdf" to convert it to PDF, it shows broken characters:

broken unicode

My command is:

wkhtmltopdf --encoding utf-8 book.html book.pdf

How to fix this?


Solution

  • Finally I found the reason: I don't have unicode fonts in my ubuntu server.

    I upload some truetype fonts from my local ubuntu to the server, everything works fine.

    freewind@freewind:/usr/share/fonts$ cd truetype/
    freewind@freewind:/usr/share/fonts/truetype$ ls
    arphic             ttf-dejavu               ttf-lao
    freefont           ttf-devanagari-fonts     ttf-liberation
    kochi              ttf-gujarati-fonts       ttf-malayalam-fonts
    msttcorefonts      ttf-indic-fonts-core     ttf-oriya-fonts
    openoffice         ttf-japanese-gothic.ttf  ttf-punjabi-fonts
    sazanami           ttf-japanese-mincho.ttf  ttf-tamil-fonts
    takao              ttf-kacst-one            ttf-telugu-fonts
    thai               ttf-kannada-fonts        unfonts
    ttf-bengali-fonts  ttf-khmeros-core         wqy
    

    I simply upload them all, it fix this problem, although I don't know which font is the key.