Search code examples
rlocalepaste

Using Japanese letters in paste function


I want to download search query data from google trends for both japanese and english search terms. It works perfectly fine when I use english search terms only, but it does not work as soon as I include japanese letters.

My code is the following(I included the default keyword just for this example to make it easier to use):

URL_GT=function(keyword="Toyota Aygo %2B Toyota Yaris %2B Toyota Vitz %2B
トヨタヴィッツ", year=2010, month=1, length=68){

  start="http://www.google.com/trends/trendsReport?hl=en-US&q="
  end="&cmpt=q&content=1&export=1"
  date=""

  queries=keyword[1]
  if(length(keyword)>1) {
    for(i in 2:length(keyword)){
      queries=paste(queries, "%2C ", keyword[i], sep="")
    }
  }

  #Dates
  if(!is.na(year)){
    date="&date="
    date=paste(date, month, "%2F", year, " ", month+length-1, "m", sep="")
  }

  URL=paste(start, queries, date, end, sep="")
  browseURL(URL)
}

When I look at the download URL that gets called in my browser I can see that the japanese letters got transformed into some %, numbers and letters, but they are not supposed to change at all.

When I use

Sys.setlocale("LC_CTYPE","japanese_JAPAN")

I get the following paste result

paste("トヨタヴィッツ","Toyota Vitz", sep = "")
[1] "ƒgƒˆƒ^ƒ”ƒBƒbƒcToyota Vitz"

I think this shows pretty good that the paste() function seems not to work as intended.

Using

Sys.setlocale("LC_CTYPE","german_GERMANY")

I get following error message

unexpected INCOMPLETE_STRING
1: URL_GT=function(keyword="Toyota Aygo %2B Toyota Yaris %2B Toyota Vitz %2B ?

indicating that R cannot interpret the japanese letters.

I tried finding a solution, but could only find tips which led me to change my locale. As discribed above this did not work for me so far. I also found this tip, but I got the same error as the enquirer of that question - namely

Warning message: In Sys.setlocale("LC_CTYPE", "UTF-8") : OS reports request
to set locale to "UTF-8" cannot be honored 

I am very grateful for any help! Since this is my first post ever I hope that everything concerning structure and detail is alright.


Solution

  • I found a solution that works just fine for me. I had to change the language for unicode-incompatible programs in order for the japanese local to work properly.

    On Windows 8.1 you have to go to the control panel, time, region & language, region, administration and there you can change the language accordingly - in my case japanese - restart your pc afterwards.

    If you now set your local to

    Sys.setlocale("LC_CTYPE","japanese_JAPAN")
    

    typing in paste should return what you asked for e.g.

    paste("It works", "トヨタヴィッツ", sep=" ")
    [1] "It works トヨタヴィッツ"
    

    The only thing that still confuses me is that when I open the Excel file after the download the Japanese letters appear in a new criptic way.

    I tried downloading the data for the word manually and get the same result in the Excel file. So I guess the data should be the correct one. Unfortunately I did not download a CSV file of the japanese data before I changed my unicode language to see if excel messed it up there as well. But when I restored my settings to german again the same criptic letters appeared in the downloaded file.