Search code examples
rlapplyread.csv

read.csv() in R "no lines available in input" error


I am trying to loop through a directory and read all of the files in a list. These files are all from the same github repo found here https://github.com/CSSEGISandData/COVID-19

path = "~/Documents/Corona_Virus/COVID-19/archived_data/archived_daily_case_updates/"
setwd(path)
file.names<-list.files(path)
archived_DAYS<-lapply(file.names,read.csv,sep=",",header=T)

goes off without a hitch, but then

path2 = "~/Documents/Corona_Virus/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/"
setwd(path2)
daily_file_names<-list.files(path2)
daily_DAYS<-lapply(daily_file_names,read.csv,sep=",")

throws the error

"Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input"

however the types of files in both directories are .csv files that are all structured the same way. I don't see why it's throwing that error as every file has populated data


Solution

  • To read the files locally in R, one can do the following.

    1. fork the COVID-19 repository in Github
    2. clone the repository to the machine on which you'll run RStudio / R
    3. in RStudio, create a project starting in the root directory of the cloned COVID-19 repository

    At this point the current R working directory is the root directory of the cloned Github repository. The following code will retrieve all the daily archived files and read them into a list of data frames.

    # 
    # archived days data
    # 
    theFiles <- list.files("./archived_data/archived_daily_case_updates",pattern="*.csv",full.names = TRUE)
    
    dataList <- lapply(theFiles,read.csv,stringsAsFactors=FALSE)
    

    We can print the first few rows of data from the first data frame in the resulting list as follows.

    > head(dataList[[1]])
      ï..Province.State Country.Region    Last.Update Confirmed Deaths Recovered Suspected
    1             Anhui Mainland China 1/21/2020 10pm        NA     NA        NA         3
    2           Beijing Mainland China 1/21/2020 10pm        10     NA        NA        NA
    3         Chongqing Mainland China 1/21/2020 10pm         5     NA        NA        NA
    4         Guangdong Mainland China 1/21/2020 10pm        17     NA        NA         4
    5           Guangxi Mainland China 1/21/2020 10pm        NA     NA        NA         1
    6           Guizhou Mainland China 1/21/2020 10pm        NA     NA        NA         1
    > 
    

    Note that the full.names = TRUE argument in list.files() is needed to include the path in the resulting list of file names.

    > # show path names in list of files
    > head(theFiles)
    [1] "./archived_data/archived_daily_case_updates/01-21-2020_2200.csv"
    [2] "./archived_data/archived_daily_case_updates/01-22-2020_1200.csv"
    [3] "./archived_data/archived_daily_case_updates/01-23-2020_1200.csv"
    [4] "./archived_data/archived_daily_case_updates/01-24-2020_0000.csv"
    [5] "./archived_data/archived_daily_case_updates/01-24-2020_1200.csv"
    [6] "./archived_data/archived_daily_case_updates/01-25-2020_0000.csv"
    >
    

    What caused the error in the original post?

    The original poster asked why the code for the daily case updates failed in the comments to my answer. My hypothesis was that the existence of a README.md file in the subdirectory caused read.csv() to fail. Since my answer used pattern = '*.csv' in list.files(), it avoids reading a non-csv file with read.csv().

    I ran the following code to test this hypothesis.

    # replicate original error
    originalDirectory <- getwd()
    path2 =paste0(originalDirectory, "/csse_covid_19_data/csse_covid_19_daily_reports")
    setwd(path2)
    daily_file_names<-list.files(path2)
    daily_DAYS<-lapply(daily_file_names,read.csv,sep=",")
    

    I received the same error as documented in the original post.

    > # replicate original error
    > originalDirectory <- getwd()
    > path2 =paste0(originalDirectory, "/csse_covid_19_data/csse_covid_19_daily_reports")
    > setwd(path2)
    > daily_file_names<-list.files(path2)
    > daily_DAYS<-lapply(daily_file_names,read.csv,sep=",")
    Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
      no lines available in input
    > 
    

    After adding pattern = '*.csv' to list.files(), the code works correctly.

    > # use pattern = "*.csv"
    > daily_file_names<-list.files(path2,pattern = "*.csv")
    > daily_DAYS<-lapply(daily_file_names,read.csv,sep=",")
    > head(daily_DAYS[[1]])
      ï..Province.State Country.Region     Last.Update Confirmed Deaths Recovered
    1             Anhui Mainland China 1/22/2020 17:00         1     NA        NA
    2           Beijing Mainland China 1/22/2020 17:00        14     NA        NA
    3         Chongqing Mainland China 1/22/2020 17:00         6     NA        NA
    4            Fujian Mainland China 1/22/2020 17:00         1     NA        NA
    5             Gansu Mainland China 1/22/2020 17:00        NA     NA        NA
    6         Guangdong Mainland China 1/22/2020 17:00        26     NA        NA
    >