I have the following script:
city <- c("Екатеринбург", NA, "Курск", "Псков",
"березники", "Челябинск", NA, "москва",
"москва", "Петергоф/Санкт-Петербург",
"Петергоф/Санкт-Петербург", "Волгоград",
"Олегегорск", "СПб", "Москва", "Москва",
"Москва ", "Санкт-Петербург")
city[grep("^(москва|мск|msk)", city, ignore.case = TRUE)] <- "Москва"
city[grep("питер|спб|spb|петербург", city, ignore.case = TRUE)] <- "Санкт-Петербург"
city[grep("Москва|Санкт-Петербург", city, invert = TRUE)] <- "Другие города"
print(city)
When I run Rscript test.R
I get some results:
% Rscript test.R
[1] "Другие города" "Другие города" "Другие города" "Другие города"
[5] "Другие города" "Другие города" "Другие города" "Москва"
[9] "Москва" "Санкт-Петербург" "Санкт-Петербург" "Другие города"
[13] "Другие города" "Санкт-Петербург" "Москва" "Москва"
[17] "Москва" "Санкт-Петербург"
When I run source("test.R")
I get the different results:
% Rscript -e 'source("test.R")'
[1] "Другие города" "Другие города"
[3] "Другие города" "Другие города"
[5] "Другие города" "Другие города"
[7] "Другие города" "Москва"
[9] "Москва" "Петергоф/Санкт-Петербург"
[11] "Петергоф/Санкт-Петербург" "Другие города"
[13] "Другие города" "Другие города"
[15] "Москва" "Москва"
[17] "Москва " "Санкт-Петербург"
I got correct results when:
Rscript
: Rscript test.R
With source()
I got incorrect results (with Rscript -e
or inside R session).
System info may be helpful:
sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Arch Linux
locale:
[1] LC_CTYPE=ru_RU.UTF-8 LC_NUMERIC=C LC_TIME=ru_RU.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=ru_RU.UTF-8 LC_MESSAGES=ru_RU.UTF-8 LC_PAPER=ru_RU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.2.1
It has to do with file encoding. Add the following options to source
: encoding="UTF-8", verbose=T
If you leave off the encoding option (keeping verbose=T option), you will see at the top of the output that the default encoding is encoding = "native.enc"
which is not what you want for Greek characters.