I have blank lines between monthly blocks of data in a csv file. I looked at using the blank.line.skips = TRUE
parameter of fread or read.csv to drop those empty rows.
But what I am getting is a line with a NA at the end. Why is the row not completely dropped?
I did see some old data answers on SO about fread crashing when it has a blank row but that seems to have been fixed.
TEST CASE
temp <- data.table(a = c("a","","c","d"),
b = c(10,"",30,40))
fwrite (temp, "test.csv")
mydata <- fread("test.csv",
blank.lines.skip = TRUE,
stringsAsFactors = FALSE)
RESULTS
I get the second row which is blank included with an NA added:
> mydata
a b
1: a 10
2: NA
3: c 30
4: d 40
I wanted (and expected):
> mydata
a b
1: a 10
2: c 30
3: d 40
(I realise I can get this with a mydata[complete.cases(mydata), ]
but I expected the blank.lines.skips
to do this. From fread help "If TRUE blank lines in the input are ignored.")
Is this fread
leaving the line a bug or feature?
When you perform fwrite (temp, "test.csv")
the second line (without considering headers) is not blank... It has a separator:
a,b
a,10
,
c,30
d,40
The argument blank.lines.skips
is for truly blanklines:
Ex: with test.csv as following
a,b
a,10
c,30
d,40
To control :
> dim(fread("test.csv", blank.lines.skip = TRUE))
[1] 3 2
Argument blank.lines.skip = TRUE
avoid stopping at the first blank line. Without setting this argument you would have:
> dim(fread("test.csv"))
[1] 1 2
Warning message:
In fread("test.csv") :
Stopped reading at empty line 3 but text exists afterwards (discarded): c,30
To solve your problem of blank lines, I would advise to: