Search code examples
rdataframelevels

How to create a table that indicates existence of factor levels in several data.frames


I have several data.frames that have some common factor variables. However, missing observations cause discrepancies and some levels go missing in some of the data.frames. I would like to create a summary table that indicates which data.frames include the levels and which do not.

Like this

FACTOR1

            DF.1   DF.2
  LEVEL1    TRUE   TRUE
  LEVEL2    TRUE   FALSE

where LEVEL1 of FACTOR1 appears in both data.frames DF.1 and DF.2 whereas LEVEL2 appears only in DF.1 but not in DF.2.


Solution

  • Try:

    df.1 = data.frame(var=c('a','b','c','d'))
    df.2 = data.frame(var=c('a','b','c'))
    df.3 = data.frame(var=c('a','d','d'))
    
    ldf = list()
    for(i in 1:3){
        ldf[[length(ldf)+1]] =get(paste0('df','.',i))
    }
    ll = sapply(ldf, function(x) {c(levels(x$var))})
    levellist = unique(unlist(ll))
    
    levellist
    [1] "a" "b" "c" "d"
    
    sapply(ldf, function(x) {levellist %in% levels(x$var) })
         [,1]  [,2]  [,3]
    [1,] TRUE  TRUE  TRUE
    [2,] TRUE  TRUE FALSE
    [3,] TRUE  TRUE FALSE
    [4,] TRUE FALSE  TRUE