Search code examples
rcsvimportreadr

Import big CSV files at once in R


I`ve 70 csv files with the same columns in a folder, each of them are 0.5 GB. I want to import them into a single dataframe in R.

Normally I import each of them correctly as below:

df <- read_delim("file.csv", 
"|", escape_double = FALSE, col_types = cols(pc_no = col_character(), 
    id_key = col_character()), trim_ws = TRUE)

To import all of them, coded like that and error as follows: argument "delim" is missing, with no default

tbl <-
list.files(pattern = "*.csv") %>% 
map_df(~read_delim("|", escape_double = FALSE, col_types = cols(pc_no = col_character(), id_key = col_character()), trim_ws = TRUE))

With read_csv, imported but appears only one column which contains all columns and values.

 tbl <-
 list.files(pattern = "*.csv") %>% 
 map_df(~read_csv(., col_types = cols(.default = "c")))

Solution

  • In your second block of code, you're missing the ., so read_delim is interpreting your arguments as read_delim(file="|", delim=<nothing provided>, ...). Try:

    tbl <- list.files(pattern = "*.csv") %>% 
      map_df(~ read_delim(., delim = "|", escape_double = FALSE,
                          col_types = cols(pc_no = col_character(), id_key = col_character()),
                          trim_ws = TRUE))
    

    I explicitly identified delim= here but it's not strictly necessary. Had you done that in your first attempt, however, you would have seen

    readr::read_delim(delim = "|", escape_double = FALSE,
                      col_types = cols(pc_no = col_character(), id_key = col_character()),
                      trim_ws = TRUE)
    # Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types,  : 
    #   argument "file" is missing, with no default
    

    which is more indicative of the actual problem.