Search code examples
rstringi

R - transliterating into German alphabet using stri_trans_general()


I have a large number of names, mostly using a German character set, i.e., ASCII plus ä,ö,ü,ß. Some names use special characters (e.g. ğ) which I would like to transliterate into the German version. So, "Özoğuz" should become "Özoguz".

I have tried

stri_trans_general("Özoğuz", "de-ASCII")

but that will result in "Oezoguz" not the desired "Özoguz".


Solution

  • The de-ASCII rule set translates Ö to Oe. If you want to deviate from this rule but otherwise maintain the German ASCII rule set, the stringi docs state that Custom rule-based transliteration is also supported.

    We can define rules which translate (upper and lower case) Ö to a third character, apply the de-ASCII rules to everything else, then translates the third character back to Ö:

    id <- "
        Ö > \u2135;
        ö > \u2136;
        :: de-ASCII;
        \u2135 >  Ö;
        \u2136 > ö
    "
    
    stringi::stri_trans_general("Özoğuz", id, rules = TRUE)
    # [1] "Özoguz"
    

    I have used "ℵ" and "ℶ" for upper and lower case Ö respectively, but any utf-8 characters you are sure will not be in your string should work.