I would like to decode this string in R: обезпечен
. The desired output should be: обезпечен
This site suggest that the source encoding is UTF-8
and it should be trans-coded to Windows-1251
. So I tried with no success this:
> word <- "обезпечен"
> iconv(word, from = "UTF-8",to = "Windows-1251")
[1] "обезпечен"
These steps seem to do the trick
word <- "обезпечен"
xx <- iconv(word, from="UTF-8", to="cp1251")
Encoding(xx) <- "UTF-8"
xx
# [1] "обезпечен"
target <- "обезпечен"
xx == target
# [1] TRUE
So it seems what happened was at one point the bytes that make up the UTF-8 target
value were misinterpreted as being cp1251 encoded and somewhere a process ran to convert the bytes to UTF-8 based on the cp1251->UTF-8 mapping rules. However, when you run this on data that insn't really cp1251 encoded you get weird values.
iconv(target, from="cp1251", to="UTF-8")
# "обезпечен"