Search code examples
databaseencodingshinydplyrr-dbi

character encoding, dplyr with database (postgresql)


I've read the threads and package updates for encoding issues with Shiny, but I have a (difficult-to-reproduce example) database-driven Shiny app which is fumbling some special characters.

In my postgresql database I see correctly my Swedish river, "Upper Umeälven River", which - when I filter it back to the Shiny interface with dplyr: names.rivers <- filter(tbl.rivers, Country == "Sweden") ...becomes "Upper Umeälven River" in R.

I'm using UTF-8 encoding locally; I guess I'm losing something on the exchange with the database.

Sys.getlocale() [1] "LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252"

Apologies again for the lack of example, it's ONLY an issue pulling from the database. I suspect I'm missing a flag on some sanitizing function someplace, but need some help getting pointed the right direction.


Solution

  • As suspected, the answer was simple: iconv(vector.to.convert, "UTF-8")

    My "learnings":

    1. Encodings of the source file, the database, and data streams are not the same thing;
    2. I spent time making sure the data sources had been created in the correct encoding, ignoring the (implicit?) conversion of the datastream;
    3. This page helped: http://shiny.rstudio.com/articles/unicode.html

    My understanding is a bit shallow, but - frankly - I'm not digging deeper into the world of character encoding for the moment. I hope it helps someone else avoid the error!