Search code examples
rutf-8latexr-markdownpdflatex

Writing UTF-8 in Windows using writeLines()


I use Window10 64, R Studio Version 1.1.383 and MiKTeX 2.9. I try to print PDFs from html text using rmarkdown using a function.

# function html -> pdf
write_pdfx <- function(x){
  for(i in 1:nrow(x)) {
    message(sprintf("Processing %s", x$id[i]))
    tf <- tempfile(fileext=".html")
    writeLines(x$content[i], tf, useBytes = FALSE)
    pandoc_convert(
      input = tf,
      to = "latex",
      output = sprintf("%s.pdf",x$id[i]),
      wd = getwd()
    )
    unlink(tf)
  }}

The df contains two columns: id with id's and content with html texts. Encoding is UTF-8

Encoding(df$content) <- "UTF-8" 

Unfortunately the html texts contain a lot of special characters. Most of them (such as "ü" or "ä" will not cause any problems). However, some, like "ẗ" will cause an error:

pandoc.exe: Cannot decode byte '\xfc': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
Error: pandoc document conversion failed with error 1

If I set useBytes = TRUEI get another error:

! Package inputenc Error: Unicode char ẗ (U+1E97)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.212 ...sene Vergleich, wonach ersterer gestüẗ

Try running pandoc with --latex-engine=xelatex.
pandoc.exe: Error producing PDF

I also tried xelatex, without success.

Package inputenc Error & Error: pandoc document conversion failed with error 43 Did not solve the problem.

I also found this information, however, I coul not implement it: How to write Unicode string to text file in R Windows? Hebrew Encoding Hell in R and writing a UTF-8 table in Windows https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16064

Is there any way to write UTF-8 using writeLines() on Windows in my case?


Solution

  • Found a solution: I did not use the right setting to switch to xelatex. Adding options ="--latex-engine=xelatex to pandoc_convert solved the problem! :D