Search code examples
rfreadread.csv

fread() blank.lines.skips leaves NA line


I have blank lines between monthly blocks of data in a csv file. I looked at using the blank.line.skips = TRUE parameter of fread or read.csv to drop those empty rows.

But what I am getting is a line with a NA at the end. Why is the row not completely dropped?

I did see some old data answers on SO about fread crashing when it has a blank row but that seems to have been fixed.

TEST CASE

temp <- data.table(a = c("a","","c","d"), 
                   b = c(10,"",30,40))

fwrite (temp, "test.csv")



mydata <- fread("test.csv", 
                blank.lines.skip = TRUE,
                stringsAsFactors = FALSE)

RESULTS

I get the second row which is blank included with an NA added:

> mydata
a  b
1: a 10
2:   NA
3: c 30
4: d 40

I wanted (and expected):

> mydata
a  b
1: a 10
2: c 30
3: d 40

(I realise I can get this with a mydata[complete.cases(mydata), ] but I expected the blank.lines.skips to do this. From fread help "If TRUE blank lines in the input are ignored.")

Is this fread leaving the line a bug or feature?


Solution

  • When you perform fwrite (temp, "test.csv") the second line (without considering headers) is not blank... It has a separator:

    a,b
    a,10
    ,
    c,30
    d,40
    

    The argument blank.lines.skips is for truly blanklines:

    Ex: with test.csv as following

    a,b
    a,10
    
    c,30
    d,40
    

    To control :

    > dim(fread("test.csv", blank.lines.skip = TRUE))
    [1] 3 2
    

    Argument blank.lines.skip = TRUE avoid stopping at the first blank line. Without setting this argument you would have:

    > dim(fread("test.csv"))
    [1] 1 2
    Warning message:
    In fread("test.csv") :
      Stopped reading at empty line 3 but text exists afterwards (discarded): c,30
    

    Edit:

    To solve your problem of blank lines, I would advise to:

    • Strip them from your file before reading your data if you have a lot of those lines.
    • Drop them after reading if you have just a few of them.