Search code examples
rarabicpersiancoercion

how to convert Arabic numbers in character class to English numeric in R?


I have a character data frame which its first column contains Arabic/Persian numbers. Actually, the class of these numbers is "character". How can I convert them to English numeric in order to do some calculation with them?


Solution

  • It seems to be mostly a question of character mappings.

    Not extensively tested, but the following seems to work, at least for Persian number strings.

    persian <- "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669\u06F0\u06F1\u06F2\u06F3\u06F4\u06F5\u06F6\u06F7\u06F8\u06F9"
    english <- "01234567890123456789"
    persian.tonumber <- function(s) as.numeric(chartr(persian,english,s))
    

    For example,

    > persian.tonumber("٢٣٤٥")
    [1] 2345
    

    I obtained the Unicode from this answer. You could extend the translation vectors if need be to include Arabic symbols (if they aren't already covered by the Persian symbols -- I am not really familiar with the system that you are referring to).