I have a comma-separated value file that looks like this when I open it in vim:
12,31,50,,12^M34,23,45,2,12^M12,31,50,,12^M34,23,45,2,12^M
and so forth. I believe this means my CSV uses CR-only (classic mac) line endings. R's read.table() function ostensibly requires LF line endings, or some variant thereof.
I know I can preprocess the file, and that's probably what I'll do.
That solution aside: is there a way to import CR files directly into R? For instance, write.table() has an "eol" parameter one can use to specify the line ending of outputs -- but I don't see a similar parameter for read.table() (cf. http://stat.ethz.ch/R-manual/R-patched/library/utils/html/read.table.html).
R will not recognize "^M" as anything useful.(I suppose it's possible that vim is just showing you a cntrl-M as that character.) If that were in a text-connection-stream R will think it's not a valid escaped-character, since "^" is not used for that purpose. You might need to do the pre-processing, unless you want to pass it through scan() and substitute using gsub():
subbed <- gsub("\\^M", "\n", scan(textConnection("12,31,50,,12^M34,23,45,2,12^M12,31,50,,12^M34,23,45,2,12^M"), what="character"))
Read 1 item
> read.table(text=subbed, sep=",")
V1 V2 V3 V4 V5
1 12 31 50 NA 12
2 34 23 45 2 12
3 12 31 50 NA 12
4 34 23 45 2 12
I suppose it's possible that you may need to use "\\m" as the patt
argument to gsub
.
A further note: The help page for scan says: "Whatever mode the connection is opened in, any of LF, CRLF or CR will be accepted as the EOL marker for a line and so will match sep = "\n"." So the linefeed character ("\n"if that's what they are) should have been recognized them, since read.table
is based on scan
. You should look at ?Quotes for information on escape characters.
If this vim tutorial is to be believed those may be DOS-related characters since it offers this advice:
Strip DOS ctrl-M's:
:1,$ s/{ctrl-V}{ctrl-M}//