Search code examples
rdplyrtidyversereadr

writing and reading class of columns to csv


For a dataframe, I'd like to save the data class of each column (eg. char, double, factor) to a csv, and then be able to read both the data, and the classes, back into R.

For example, my data might look like this:

df
#> # A tibble: 3 × 3
#>    item  cost blue 
#>   <int> <int> <fct>
#> 1     1     4 1    
#> 2     2    10 1    
#> 3     3     3 0

(code for data input here:)

library(tidyverse)
df <- tibble::tribble(
  ~item, ~cost, ~blue,
     1L,    4L,    1L,
     2L,   10L,    1L,
     3L,    3L,    0L
  )

df <- df %>% 
  mutate(blue = as.factor(blue))
df

I'm able to save the classes of the data, and the data, this way:

library(tidyverse)
classes <- map_df(df, class)

write_csv(classes, "classes.csv")
write_csv(df, "data.csv")

and I can read it back this way:

classes <- read.csv("classes.csv") %>% 
  slice(1) %>% 
  unlist()
classes
df2 <- read_csv("data.csv", col_types = classes)
df2

Is there a quicker way to do all of this?

Particularly with the way I'm saving classes and then reading it back in, then slicing and unlisting?


Solution

  • You could use writeLines and its counterpart readLines for the classes. Like this:

    classes <- sapply(df, class)
    writeLines(classes, "classes.txt")
    #to read them
    readLines("classes.txt")
    

    However, consider also other formats like parquet (the R implementation is provided by the arrow package) for instance that preserve the data types and are implemented by many languages.