Search code examples
rpostgresqlrodbcgreenplum

rodbc character encoding error with PostgreSQL


I'm getting a new error which I've never gotten before when connecting from R to a GreenPlum PostgreSQL database using RODBC. I've gotten the error using both EMACS/ESS and RStudio, and the RODBC call has worked as is in the past.

library(RODBC)
gp <- odbcConnect("greenplum", believeNRows = FALSE)
data <- sqlQuery(gp, "select * from mytable")

> data
[1] "22P05 7 ERROR: character 0xc280 of encoding \"UTF8\" has no equivalent in  "WIN1252\";\nError while executing the query" 
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select * from mytable'"

EDIT: Just tried querying another table and did get results. So I guess it's not an RODBC problem but a PostgreSQL table encoding problem.

R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RODBC_1.3-2
> 

Solution

  • First, the issue arises because R is trying to convert to a Windows locale that supports UTF8. Unfortunately, Brian Ripley has reported numerous times that Windows has no UTF8 locales. From hours spent searching the web, StackOverflow, Microsoft, etc., I have come to the conclusion that Microsoft hates UTF-8 Windows won't support UTF8.

    As a result, I'm not sure that there's an easy solution to this, if there is any solution at all. The best I can recommend is to wrap some kind of conversion on the server side, look at filtering the data if you can, or try a different language, if appropriate (e.g. Chinese, Japanese, Korean).

    If you do decide to wrap a converter, unicode.org recommends this ICU toolkit.