For a dataframe, I'd like to save the data class of each column (eg. char, double, factor) to a csv, and then be able to read both the data, and the classes, back into R.
For example, my data might look like this:
df
#> # A tibble: 3 × 3
#> item cost blue
#> <int> <int> <fct>
#> 1 1 4 1
#> 2 2 10 1
#> 3 3 3 0
(code for data input here:)
library(tidyverse)
df <- tibble::tribble(
~item, ~cost, ~blue,
1L, 4L, 1L,
2L, 10L, 1L,
3L, 3L, 0L
)
df <- df %>%
mutate(blue = as.factor(blue))
df
I'm able to save the classes of the data, and the data, this way:
library(tidyverse)
classes <- map_df(df, class)
write_csv(classes, "classes.csv")
write_csv(df, "data.csv")
and I can read it back this way:
classes <- read.csv("classes.csv") %>%
slice(1) %>%
unlist()
classes
df2 <- read_csv("data.csv", col_types = classes)
df2
Is there a quicker way to do all of this?
Particularly with the way I'm saving classes
and then reading it back in, then slicing and unlisting?
You could use writeLines
and its counterpart readLines
for the classes. Like this:
classes <- sapply(df, class)
writeLines(classes, "classes.txt")
#to read them
readLines("classes.txt")
However, consider also other formats like parquet
(the R implementation is provided by the arrow
package) for instance that preserve the data types and are implemented by many languages.