Search code examples
rstringencodingescaping

Memory representation of ASCII control codes in R


If I write a <- "\n" in R, what does R store in memory?

Does it store 0x0A (line feed character), or does it store 0x5C6E (literal \n)?

Said otherwise, is cat that returns the string as it appears in memory, and print that reverts back control codes into escape sequences (consistently with first encoding above), or the other way around, that is, print that returns the string as it is in memory, and cat that interprets escape sequences (consistently with second encoding above)?

I find it confusing because in C "\n" was only a shortcut to write the control codes. So C would write the line feed character 0x0A in memory. I don't think you would ever get "\n" back from a string in C.


Solution

  • In a nutshell, R directly stores control characters as raw bytes. That is, '\n' is stored internally as 0xA0. As suggested by SamR in the comment, you can verify this by running charToRaw(), which shows you the raw byte buffer:

    charToRaw('\n')
    # [1] 0a
    

    In this regard it mirrors essentially every other mainstream programming language.

    But like many other language interpreters, R treats string values specially when printing them on the REPL terminal. That’s why printing the value '\n' displays "\n" instead of a line break. If you want to display the value of strings, you therefore can’t use print().1 Instead, you need to use either cat(), writeLines() or, if you want to print to the standard error stream, message().


    1 Unless you wrap the value in noquote(), but using a proper text output function is generally preferred.