fread() blank.lines.skips leaves NA line

I have blank lines between monthly blocks of data in a csv file. I looked at using the blank.line.skips = TRUE parameter of fread or read.csv to drop those empty rows.

But what I am getting is a line with a NA at the end. Why is the row not completely dropped?

I did see some old data answers on SO about fread crashing when it has a blank row but that seems to have been fixed.

TEST CASE

temp <- data.table(a = c("a","","c","d"), 
                   b = c(10,"",30,40))

fwrite (temp, "test.csv")



mydata <- fread("test.csv", 
                blank.lines.skip = TRUE,
                stringsAsFactors = FALSE)

RESULTS

I get the second row which is blank included with an NA added:

> mydata
a  b
1: a 10
2:   NA
3: c 30
4: d 40

I wanted (and expected):

> mydata
a  b
1: a 10
2: c 30
3: d 40

(I realise I can get this with a mydata[complete.cases(mydata), ] but I expected the blank.lines.skips to do this. From fread help "If TRUE blank lines in the input are ignored.")

Is this fread leaving the line a bug or feature?

Solution

When you perform fwrite (temp, "test.csv") the second line (without considering headers) is not blank... It has a separator:

a,b
a,10
,
c,30
d,40

The argument blank.lines.skips is for truly blanklines:

Ex: with test.csv as following

a,b
a,10

c,30
d,40

To control :

> dim(fread("test.csv", blank.lines.skip = TRUE))
[1] 3 2

Argument blank.lines.skip = TRUE avoid stopping at the first blank line. Without setting this argument you would have:

> dim(fread("test.csv"))
[1] 1 2
Warning message:
In fread("test.csv") :
  Stopped reading at empty line 3 but text exists afterwards (discarded): c,30

Edit:

To solve your problem of blank lines, I would advise to:

Strip them from your file before reading your data if you have a lot of those lines.
Drop them after reading if you have just a few of them.