Search code examples
rlistclassdataframetypeof

Assigning data.frame column class from other data.frame R


I have a list of data.frames sampleList. Each data.frame in that list is a bit different in terms of columns and their orders.

I also have another data.frame refData1, which I would like to use as a model for the rest in terms of class of column with the same titles.

In other words, I would like the sampleList to be updated and any matching columns with refData1 should change their class and reflect the class type in refData1 for the corresponding column. The columns that are not matching should stay as they are. Please, not that refData1 contains also a column that none of the data.frames in the list has. Thank you.

sampleData1 <- data.frame(id = 1:10, 
                          gender = as.factor(sample(c("Male", "Female"), 
                                                    10, replace = TRUE)),
                          age = as.character(rnorm(10, 40, 10)),
                          height = as.character(rnorm(10,170,5)))
sampleData2 <- data.frame(weight = as.character(rnorm(10,80,5)),
                          gender = sample(c("Male", "Female"), 
                                          10, replace = TRUE),
                          id = 11:20, 
                          age = rnorm(10, 44, 10))
sampleData3 <- data.frame(id = as.factor(21:30), 
                          age = as.character(rnorm(10, 36, 10)),
                          gender = sample(c("Male", "Female"), 10, 
                                          replace = TRUE),
                          score = as.character(rnorm(10,20,2)))
sampleList <- list(sampleData1,sampleData2,sampleData3)

refData1 <- data.frame(id = 1:10, # numeric
                       gender1 = as.character(sample(c("Male", "Female"), 
                                                     10, replace = TRUE)),
                       agen = rnorm(10, 40, 10), # numeric
                       height = rnorm(10,170,5), # numeric
                       weight = rnorm(10,80,5),  # numeric
                       other = as.factor(sample(c("a", "b","c"), 
                                                10, replace = TRUE)))

Solution

  • We loop through the 'sampleList', get the intersecting columns with 'refData1', set the class of data.frame with the ones in the 'refData1'

    sampleListN <- lapply(sampleList, function(x) {
            nm1 <- intersect(names(x), names(refData1))
            x[nm1] <- Map(function(u, v) {class(u) <- class(v)
                                   u},
                           x[nm1], refData1[nm1])
             x})
    

    As @mt1022 mentioned, if we go by the logic, then factor columns converting directly to integer can create issues as we wrong values i.e. integer storage values instead of the actual values. Based on the data, we don't even need to compare with 'refData1'. We can do this automatically with type.convert

    lapply(sampleList, function(x) {
           nm1 <- intersect(names(x), names(refData1))
           x[nm1] <- lapply(x[nm1], function(x)
                 type.convert(as.character(x), as.is = TRUE))
           x})