Search code examples
rtidyversereadr

How to use col_types using readr's read_delim_chunked?


I am trying to read a file in chunks and specify the col_types, see MWE

write.csv(cars, "cars.csv")


library(readr)
readr::read_delim_chunked("cars.csv", function(x, i) {
  x
}, delim= ",", col_types = cols(
  speed = col_character()
), chunk_size = 10)

but I get erroneous output

NULL

but the non-chunked version works fine

library(readr)
readr::read_delim("cars.csv", delim= ",", col_types = cols(
  speed = col_character()
))

Solution

  • The issue would be that when we do the write.csv, the row.names were included as a new column

    write.csv(cars, "cars.csv", row.names = FALSE, quote = FALSE)
    

    Also, we need col_character() instead of col_character

    readr::read_delim_chunked("cars.csv",  DataFrameCallback$new(function(x, i) {
      x
    }), col_types = cols(
      speed = col_character()
    ), delim= ",",  chunk_size = 10)