Search code examples
rutf-8r-markdownkableextra

UTF-8 encoding of umlauts/acute accent in popovers/tooltips of kableextra in R Markdown


I want to display words with umlauts (i.e. äöü) and accents (e.g. éè) in a tooltip in a kableextra table. However, something with the encoding seems to go wrong. See:

---
title: "R Markdown - Test umlaut"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(kableExtra)
library(dplyr)
```

If I create this simple table, I get a warning and Chinese (?) letters in the tooltip. If I do the same with popover = "café" I get the same warning and no popover at all.

```{r kableextra}

x <- tibble(a = "Motörhead", b = "Motörfeet", c = "café", d = "olé") %>% 
  kbl() %>%
  kable_paper(full_width = F)

x %>% column_spec(3, tooltip = "café")

```

## Warning in `xml_attr<-.xml_node`(`*tmp*`, t, value = tooltip_list[t]): string is
## not in UTF-8 [1303]

enter image description here

What puzzles me is that the umlauts/accents are correctly displayed in the cells of the tables but not in the tooltip/popover.

Now I found that the problem can be solved using enc2utf8:

```{r kableextra2}

x %>% column_spec(3, tooltip = enc2utf8("café"))

```

enter image description here

What I find strange is that the string is provided via RStudio so should it not be encoded in utf-8 anyways? I also tried File -> Save with Encoding... -> utf-8. This did not help.

Is the problem with kableextra? Is there a way to solve it more elegantly? I do not really like my solution.

Sessioninfo:

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.2      kableExtra_1.3.1

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13   knitr_1.30        xml2_1.3.2        magrittr_2.0.1    tidyselect_1.1.0 
 [6] rvest_0.3.6       munsell_0.5.0     colorspace_2.0-0  viridisLite_0.3.0 R6_2.5.0         
[11] rlang_0.4.9       highr_0.8         stringr_1.4.0     httr_1.4.2        tools_4.0.3      
[16] webshot_0.5.2     xfun_0.19         tinytex_0.28      ellipsis_0.3.1    htmltools_0.5.0  
[21] yaml_2.2.1        digest_0.6.27     tibble_3.0.4      lifecycle_0.2.0   crayon_1.3.4     
[26] purrr_0.3.4       vctrs_0.3.5       rsconnect_0.8.16  glue_1.4.2        evaluate_0.14    
[31] rmarkdown_2.5     stringi_1.5.3     pillar_1.4.7      compiler_4.0.3    generics_0.1.0   
[36] scales_1.1.1      pkgconfig_2.0.3 

RStudio Version 1.3.1093

Solution

  • This does appear to be a bug in kableExtra, fixed here: https://github.com/haozhu233/kableExtra/pull/584. The issue is indicated by the warning messages: kableExtra sets some XML attributes using your input. The xml2 package wants those strings to be in UTF-8 encoding, but by default, most Windows systems use some other encoding.

    Maybe this should be fixed in xml2 instead, but at least with that patch, you can work around the issue.