Search code examples
rclassmergerbindrbindlist

Make rbindlist skip, ignore or change class attribute of the column


I would like to merge a large set of dataframes (about 30), which each have about 200 variables. These datasets are very much alike but not identical.

Please find two example dataframes below:

library(data.table)
library(haven)
df1 <- fread(
    "A   B   C  iso   year   
     0   B   1  NLD   2009   
     1   A   2  NLD   2009   
     0   Y   3  AUS   2011   
     1   Q   4  AUS   2011   
     0   NA  7  NLD   2008   
     1   0   1  NLD   2008   
     0   1   3  AUS   2012",
  header = TRUE
)
df2 <- fread(
    "A   B   D  E  iso   year   
     0   1   1  NA ECU   2009   
     1   0   2  0  ECU   2009   
     0   0   3  0  BRA   2011   
     1   0   4  0  BRA   2011   
     0   1   7  NA ECU   2008   
     1   0   1  0  ECU   2008   
     0   0   3  2  BRA   2012   
     1   0   4  NA BRA   2012",
  header = TRUE
)

To recreate the error:

class(df2$B) <- "anything"

When I do the following

df_merged <- rbindlist(list(df1, df2), fill=TRUE, use.names=TRUE)

The dataset gives the error:

Error in rbindlist(list(df1, df2), fill = TRUE, use.names = TRUE) : 
  Class attribute on column 2 of item 2 does not match with column 2 of item 1.

What can I do to either:

  1. Make rbindlist skip the column which does not match and add some suffix.
  2. Change the class of one of the columns to the other one.

Desired result for option 1:

df_merged <- fread(
    "A   B  B.x  C  D   E   iso   year   
     0   A   NA  1  NA  NA  NLD   2009   
     1   Y   NA  2  NA  NA  NLD   2009   
     0   Q   NA  3  NA  NA  AUS   2011   
     1   NA  NA  4  NA  NA  AUS   2011   
     0   0   NA  7  NA  NA  NLD   2008   
     1   1   NA  1  NA  NA  NLD   2008   
     0   1   NA  3  NA  NA  AUS   2012   
     0   NA  1   NA  1  NA  ECU   2009   
     1   NA  0   NA  2  0   ECU   2009   
     0   NA  0   NA  3  0   BRA   2011   
     1   NA  0   NA  4  0   BRA   2011   
     0   NA  1   NA  7  NA  ECU   2008   
     1   NA  0   NA  1  0   ECU   2008   
     0   NA  0   NA  3  2   BRA   2012   
     1   NA  0   NA  4  NA  BRA   2012",
   header = TRUE
)

Desired result for option 2:

df_merged <- fread(
    "A   B   C  D   E   iso   year   
     0   3   1  NA  NA  NLD   2009   
     1   4   2  NA  NA  NLD   2009   
     0   5   3  NA  NA  AUS   2011   
     1   5   4  NA  NA  AUS   2011   
     0   0   7  NA  NA  NLD   2008   
     1   1   1  NA  NA  NLD   2008   
     0   1   3  NA  NA  AUS   2012   
     0   1   NA  1  NA  ECU   2009   
     1   0   NA  2  0   ECU   2009   
     0   0   NA  3  0   BRA   2011   
     1   0   NA  4  0   BRA   2011   
     0   1   NA  7  NA  ECU   2008   
     1   0   NA  1  0   ECU   2008   
     0   0   NA  3  2   BRA   2012   
     1   0   NA  4  NA  BRA   2012",",
   header = TRUE
)

Solution

  • I came up with this inelegant solution that bypasses the problem. Basically, What I am doing is to assign the attributes of the columns of the first item of the list to the columns with the same names of all the other items of the list. Keep in mind that this solution is problematic and, depending on the project, it could be a very wrong practice as it has the potential to mess up your data. However, if what you need is to use rbindlist to combine your dataframes, this makes the trick

    
    dfs <- list(df1, df2)
    varnames <- names(dfs[[1]]) # variable names
    vattr <- purrr::map_chr(varnames, ~class(dfs[[1]][[.x]])) # variable attributes
    
    for (i in seq_along(dfs)) {
      # assign the same attributes of list 1 to the rest of the lists
      for (j in seq_along(varnames)) {
        if (varnames[[j]]  %in% names(dfs[[i]])) {
          class(dfs[[i]][[varnames[[j]]]]) <- vattr[[j]]
        } 
      }
    }
    
    
    df_merged <- data.table::rbindlist(dfs, fill=TRUE, use.names=TRUE)
    

    Best,