I have several data.frames that have some common factor variables. However, missing observations cause discrepancies and some levels go missing in some of the data.frames. I would like to create a summary table that indicates which data.frames include the levels and which do not.
Like this
FACTOR1
DF.1 DF.2
LEVEL1 TRUE TRUE
LEVEL2 TRUE FALSE
where LEVEL1 of FACTOR1 appears in both data.frames DF.1 and DF.2 whereas LEVEL2 appears only in DF.1 but not in DF.2.
Try:
df.1 = data.frame(var=c('a','b','c','d'))
df.2 = data.frame(var=c('a','b','c'))
df.3 = data.frame(var=c('a','d','d'))
ldf = list()
for(i in 1:3){
ldf[[length(ldf)+1]] =get(paste0('df','.',i))
}
ll = sapply(ldf, function(x) {c(levels(x$var))})
levellist = unique(unlist(ll))
levellist
[1] "a" "b" "c" "d"
sapply(ldf, function(x) {levellist %in% levels(x$var) })
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE FALSE
[3,] TRUE TRUE FALSE
[4,] TRUE FALSE TRUE