Search code examples
rdplyrtidyverse

set all column types in one data frame to column types of other data frame (dplyr/R)


I have a data frame/tibble df1 with each of hundreds of columns/variables painstakingly set to the correct data type (double, char, date, time, logical).

I'm periodically provided a df2 that I need to append to df1. df2 has identical variable names, count, and order as df1, but column data types do not necessarily match those of df1 (due to the source for df2 variables sometimes missing, and therefore not recognized as e.g. date or time). df2 is provided "as is": its import is out of my control.

is there a (preferably tidyverse) solution for setting/converting every df2 column type according to its corresponding df1 column's data type, so that I can continue on merging the dfs with bind_rows etc? trying to avoid hardcoding if possible.


Solution

  • Sample data:

    mt1 <- mtcars[1:3,]
    mt2 <- mtcars[1:3,]
    class(mt2$cyl) <- "character"
    sapply(mt2, class)
    #         mpg         cyl        disp          hp        drat          wt        qsec          vs          am 
    #   "numeric" "character"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric" 
    #        gear        carb 
    #   "numeric"   "numeric" 
    

    Base R

    The simplest:

    mt2fixed <- Map(function(this, oth) `class<-`(this, class(oth)), mt2, mt1) |>
      as.data.frame()
    sapply(mt2fixed, class)
    #       mpg       cyl      disp        hp      drat        wt      qsec        vs        am      gear      carb 
    # "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
    

    The use of `class<-` in a single expression is equivalent to a reassignment and returning the updated vector. For instance, these two are equivalent:

    `class<-`(vec, newclass)
    { class(vec) <- newclass; vec; }
    

    The biggest difference here is that the first allows a shorter (fewer characters) anon-function, no need for surrounding braces. (Same applies to the dplyr solution below.)

    If done in-place, it can be a little less verbose:

    Or in-place a little more briefly:
    
    ```r
    mt2[] <- Map(function(this, oth) `class<-`(this, class(oth)), mt2, mt1)
    

    The use of mt2[] on the LHS of the assignment ensures that the overall class of "data.frame" is preserved (otherwise it'll be a list).

    dplyr

    library(dplyr)
    tibble(mt2)
    # # A tibble: 3 × 11
    #     mpg cyl    disp    hp  drat    wt  qsec    vs    am  gear  carb
    #   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    # 1  21   6       160   110  3.9   2.62  16.5     0     1     4     4
    # 2  21   6       160   110  3.9   2.88  17.0     0     1     4     4
    # 3  22.8 4       108    93  3.85  2.32  18.6     1     1     4     1
    tibble(mt2) %>%
      mutate(across(everything(), ~ `class<-`(.x, class(mt1[[cur_column()]]))))# # A tibble: 3 × 11
    #     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
    #   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    # 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
    # 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
    # 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1