Search code examples
rdata-importread.csv

How can I quickly find all the files in a directory that are missing a first row?


I have a folder of files that are in .csv format. They have blank lines in them that are necessary (this indicates an absence of a measure from a LiDAR unit, which is good and needs to stay in). But occasionally, the first row is empty this throws off the code and the package and everything aborts.

Right now I have to open each .csv and see if the first line is empty.

I would like to do one of the following, but am at a loss how to:

1) write a code that quickly scans through all of the files in the directory and tells me which ones are missing the first line

2) be able to skip the empty lines that are only at the beginning--which can vary, sometimes more than one line is empty

3) have a code that cycles through all of the .csv files and inserts a dummy first line of numbers so the files all import no problem.

Thanks!


Solution

  • Here's a bit of code that does 1 and 2 above. I'm not sure why you'd want to insert dummy line(s) given the ability to do 1 and 2; it's straightforward to do, but usually it's not a good idea to modify raw data files.

    # Create some test files
    cat("x,y", "1,2", sep="\n", file = "blank0.csv")
    cat("", "x,y", "1,2", sep="\n", file = "blank1.csv")
    cat("", "", "x,y", "1,2", sep="\n", file = "blank2.csv")
    
    
    files <- list.files(pattern = "*.csv", full.names = TRUE)
    
    for(i in seq_along(files)) {
      filedata <- readLines(files[i])
      lines_to_skip <- min(which(filedata != "")) - 1
      cat(i, files[i], lines_to_skip, "\n")
      x <- read.csv(files[i], skip = lines_to_skip)
    }
    

    This prints

    1 ./blank0.csv 0 
    2 ./blank1.csv 1 
    3 ./blank2.csv 2 
    

    and reads in each dataset correctly.