Search code examples
rclassdataframelapplysapply

Comparing Column names in R across various data frames


I am currently try to compare the column classes and names of various data frames in R prior to undertaking any transformations and calculations. The code I have is noted below::

library(dplyr)
m1 <-  mtcars
m2 <-  mtcars %>% mutate(cyl = factor(cyl), xxxx1 = factor(cyl))
m3 <-  mtcars %>% mutate(cyl = factor(cyl), xxxx2 = factor(cyl))

out <-  cbind(sapply(m1, class), sapply(m2, class), sapply(m3, class))

If someone can solve this for dataframes stored in a list, that would be great. All my dataframes are currently stored in a list, for easier processing.

All.list <- list(m1,m2,m3)

I am expecting that the output is displayed in a matrix form as shown in the dataframe "out". The output in "out" is not desireable as it is incorrect. I am expecting the output to be more along the following::

enter image description here


Solution

  • I think the easiest way would be to define a function, and then use a combination of lapply and dplyr to obtain the result you want. Here is how I did it.

    library(dplyr)
    m1 <-  mtcars
    m2 <-  mtcars %>% mutate(cyl = factor(cyl), xxxx1 = factor(cyl))
    m3 <-  mtcars %>% mutate(cyl = factor(cyl), xxxx2 = factor(cyl))
    
    All.list <- list(m1,m2,m3)
    
    
    ##Define a function to get variable names and types
    my_function <- function(data_frame){
      require(dplyr)
      x <- tibble(`var_name` = colnames(data_frame),
                  `var_type` = sapply(data_frame, class))
      return(x)
    }
    
    
    target <- lapply(1:length(All.list),function(i)my_function(All.list[[i]]) %>% 
    mutate(element =i)) %>%
      bind_rows() %>%
      spread(element, var_type)
    
    target