Search code examples
ms-accessencodingcharacter-encodingrstudiorodbc

Handling Japanese characters in query to MS access database (in RStudio)


I would like to know how to handle Japanese characters in a query to a Microsoft Access database. I am trying to use a query selecting variable names written in Japanese using the function odbcQuery from RODBC package in R.

I am working with Windows. My version of RStudio is 1.1.383, and my version of Access is 14.0.7015.1000 (32-bit).

I think R understands the Japanese characters in my query, but when I try to actually carry out the query I get the following error message:

> query <- "SELECT [LOA-FTD_1_5_1_CALCULATE_LOA_query].月日 FROM [LOA-FTD_1_5_1_CALCULATE_LOA_query]"
> sqlQuery(channel,query)
[1] "42000 -3100 [Microsoft][ODBC Microsoft Access Driver] Syntax error in query expression '[LOA-FTD_1_5_1_CALCULATE_LOA_query].<U+6708><U+65E5>'."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'SELECT [LOA-FTD_1_5_1_CALCULATE_LOA_query].<U+6708><U+65E5> FROM [LOA-FTD_1_5_1_CALCULATE_LOA_query]'"

Here, 月日 was converted into U+6708 and U+65E5 in the error message. These are the UTF-8 codes for the two characters, so I guess the string is sent encoded in UTF-8 to MS Access, but MS Access is then unable to read it? Is MS Access even part of the process of carrying out the query?

So it must be an encoding issue, where RStudio and MS Access do not understand each other. When I looked at similar issues with Japanese characters, the problem was usually to display values in a table. Here the variable names are in Japanese, so the query does not work at all.

I am quite lost, so I am open to any idea or remark.

Thank you.


Solution

  • I found an answer that works for me in this post.

    The trick (at least in my case) was to set locale to Japanese_Japan.932 before any data importing. Here is the code for this command:

    Sys.setlocale("LC_ALL", locale = "Japanese_Japan.932")
    

    Then I imported my data from Access without having to change encoding, and the Japanese characters are displayed correctly in the resulting data frame. Moreover, this allows Japanese characters in the query to be understood.