I have a data frame/tibble df1 with each of hundreds of columns/variables painstakingly set to the correct data type (double, char, date, time, logical).
I'm periodically provided a df2 that I need to append to df1. df2 has identical variable names, count, and order as df1, but column data types do not necessarily match those of df1 (due to the source for df2 variables sometimes missing, and therefore not recognized as e.g. date or time). df2 is provided "as is": its import is out of my control.
is there a (preferably tidyverse) solution for setting/converting every df2 column type according to its corresponding df1 column's data type, so that I can continue on merging the dfs with bind_rows etc? trying to avoid hardcoding if possible.
Sample data:
mt1 <- mtcars[1:3,]
mt2 <- mtcars[1:3,]
class(mt2$cyl) <- "character"
sapply(mt2, class)
# mpg cyl disp hp drat wt qsec vs am
# "numeric" "character" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
# gear carb
# "numeric" "numeric"
The simplest:
mt2fixed <- Map(function(this, oth) `class<-`(this, class(oth)), mt2, mt1) |>
as.data.frame()
sapply(mt2fixed, class)
# mpg cyl disp hp drat wt qsec vs am gear carb
# "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
The use of `class<-`
in a single expression is equivalent to a reassignment and returning the updated vector. For instance, these two are equivalent:
`class<-`(vec, newclass)
{ class(vec) <- newclass; vec; }
The biggest difference here is that the first allows a shorter (fewer characters) anon-function, no need for surrounding braces. (Same applies to the dplyr solution below.)
If done in-place, it can be a little less verbose:
Or in-place a little more briefly:
```r
mt2[] <- Map(function(this, oth) `class<-`(this, class(oth)), mt2, mt1)
The use of mt2[]
on the LHS of the assignment ensures that the overall class of "data.frame"
is preserved (otherwise it'll be a list
).
library(dplyr)
tibble(mt2)
# # A tibble: 3 × 11
# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
tibble(mt2) %>%
mutate(across(everything(), ~ `class<-`(.x, class(mt1[[cur_column()]]))))# # A tibble: 3 × 11
# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1