Search code examples
rcsvline-endings

R read.table csv with classic-mac line endings


I have a comma-separated value file that looks like this when I open it in vim:

12,31,50,,12^M34,23,45,2,12^M12,31,50,,12^M34,23,45,2,12^M

and so forth. I believe this means my CSV uses CR-only (classic mac) line endings. R's read.table() function ostensibly requires LF line endings, or some variant thereof.

I know I can preprocess the file, and that's probably what I'll do.

That solution aside: is there a way to import CR files directly into R? For instance, write.table() has an "eol" parameter one can use to specify the line ending of outputs -- but I don't see a similar parameter for read.table() (cf. http://stat.ethz.ch/R-manual/R-patched/library/utils/html/read.table.html).


Solution

  • R will not recognize "^M" as anything useful.(I suppose it's possible that vim is just showing you a cntrl-M as that character.) If that were in a text-connection-stream R will think it's not a valid escaped-character, since "^" is not used for that purpose. You might need to do the pre-processing, unless you want to pass it through scan() and substitute using gsub():

    subbed <- gsub("\\^M", "\n", scan(textConnection("12,31,50,,12^M34,23,45,2,12^M12,31,50,,12^M34,23,45,2,12^M"), what="character"))
    Read 1 item
    
    > read.table(text=subbed, sep=",")
      V1 V2 V3 V4 V5
    1 12 31 50 NA 12
    2 34 23 45  2 12
    3 12 31 50 NA 12
    4 34 23 45  2 12
    

    I suppose it's possible that you may need to use "\\m" as the patt argument to gsub.

    A further note: The help page for scan says: "Whatever mode the connection is opened in, any of LF, CRLF or CR will be accepted as the EOL marker for a line and so will match sep = "\n"." So the linefeed character ("\n"if that's what they are) should have been recognized them, since read.table is based on scan. You should look at ?Quotes for information on escape characters.

    If this vim tutorial is to be believed those may be DOS-related characters since it offers this advice:

    Strip DOS ctrl-M's:

    :1,$ s/{ctrl-V}{ctrl-M}//