Search code examples
rubuntunon-ascii-characters

R handles some characters differently when installed with apt / compiled from source


R 3.4.4 from the Ubuntu repositories:

> "µV"
[1] "\302\265V"

Same computer, R 3.4.4 (and also 3.2.0, and also 3.5.1) compiled from the sources obtained from CRAN:

> "µV"
[1] "µV"

I'd rather prefer the second behaviour. Where does the difference come from?

Encoding("µV") returns "unknown" in the first case and "UTF-8" in the second case, but setting the encoding of a string variable manually doesn't seem to improve its representation.


Solution

  • For some reason, Sys.getencoding() was different in those two builds. Running Sys.setlocale("LC_COLLATE", "en_US.UTF-8") on the first build seems to have fixed the issue.