I'm getting a new error which I've never gotten before when connecting from R to a GreenPlum PostgreSQL database using RODBC. I've gotten the error using both EMACS/ESS and RStudio, and the RODBC call has worked as is in the past.
library(RODBC)
gp <- odbcConnect("greenplum", believeNRows = FALSE)
data <- sqlQuery(gp, "select * from mytable")
> data
[1] "22P05 7 ERROR: character 0xc280 of encoding \"UTF8\" has no equivalent in "WIN1252\";\nError while executing the query"
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select * from mytable'"
EDIT: Just tried querying another table and did get results. So I guess it's not an RODBC problem but a PostgreSQL table encoding problem.
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RODBC_1.3-2
>
First, the issue arises because R is trying to convert to a Windows locale that supports UTF8. Unfortunately, Brian Ripley has reported numerous times that Windows has no UTF8 locales. From hours spent searching the web, StackOverflow, Microsoft, etc., I have come to the conclusion that Microsoft hates UTF-8 Windows won't support UTF8.
As a result, I'm not sure that there's an easy solution to this, if there is any solution at all. The best I can recommend is to wrap some kind of conversion on the server side, look at filtering the data if you can, or try a different language, if appropriate (e.g. Chinese, Japanese, Korean).
If you do decide to wrap a converter, unicode.org recommends this ICU toolkit.