Search code examples
rencodingcharacter-encodingrfacebook

Encoding in R : <> unicode to letter


I am having problems when extracting comments from posts using the RFacebook package.

localiza <- getPage(543362459038077,token = my_oauth,n=10)
post <- getPost(post = localiza$id[1], token = my_oauth) here

The problem is the encoding of the output. For example:

algu/U+00E9/m

Note that instead of "/", the output has <>

That word, for instance, should appear as

alguém

Any suggestions?

Thanks in advance!


Solution

  • Consider changing your locale. It's not a problem with Rfacebook. I can replicate the behavior you described by setting locale to C, e.g.

    x <- "Boa tarde. Há alguém de plantão na agência esses dias?"
    Sys.setlocale(locale = "C")
    x
    # [1] "Boa tarde. H<U+00E1> algu<U+00E9>m de plant<U+00E3>o na ag<U+00EA>ncia esses dias?"
    

    By switching the locale for character set handling to one with an extended character set, the desired output is achieved, e.g.

    Sys.setlocale(category = "LC_CTYPE", locale = "en_US.UTF-8")
    x
    # [1] "Boa tarde. Há alguém de plantão na agência esses dias?"
    

    The value of the locale argument may be different on your system. See https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html (or ?locales) for more information on setting locales.