Search code examples
rpdfplotutf-8character-encoding

How to use UTF-8-encoded character vectors in pdf?


Trying to convert integers to UTF-8-encoded character vectors using intToUtf8 to set pch=.

(.pch <- c(intToUtf8(9675), intToUtf8(9679)))
# [1] "○" "●"

While it works fine for png,

png('foo.png', 400, 400)

with(df, plot(x, y, pch=.pch[(z < 0) + 1], xlim=0:1, ylim=0:1))
legend('topleft', pch=.pch, legend=letters[1:2])

dev.off()

enter image description here

it won't for pdf,

pdf.options(encoding='ISOLatin2.enc')
pdf('foo.pdf', 4, 4)

with(df, plot(x, y, pch=.pch[(z < 0) + 1], xlim=0:1, ylim=0:1))
legend('topleft', pch=.pch, legend=letters[1:2])

dev.off()

enter image description here

but I want pdf.

Also tried 'ISOLatin1.enc' to no avail.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Zurich
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.3.1 tools_4.3.1   

Data:

df <- structure(list(x = c(0.2, 0.6, 0.7, 0.9), y = c(0.4, 0.7, 0.9, 
0.5), z = c(-1, -1, 1, 1)), class = "data.frame", row.names = c(NA, 
-4L))

Solution

  • One workaround I have found with UTF-8 characters in pdf plots in R is to use grDevices::cario_pdf(), which the docs state can (on suitable platforms) include a much wider range of UTF-8 glyphs, and embed the fonts used.

    grDevices::cairo_pdf('foo.pdf', 4, 4)
    
    with(df, plot(x, y, pch=.pch[(z < 0) + 1], xlim=0:1, ylim=0:1))
    legend('topleft', pch=.pch, legend=letters[1:2])
    
    dev.off()
    

    enter image description here

    Note re vector vs bitmap output

    The docs also state:

    Note that unlike postscript and pdf, cairo_pdf and cairo_ps sometimes record bitmaps and not vector graphics.

    I am not exactly sure what "sometimes" means. In this case that does not occur for me: the result is a vector graphic. However you may wish to check that this is the case in your environment as well.