I have a data.frame in which cells contain a list of terms.
I wish to produce a new variable for each term found in that list indicating wether the term is present or not in that given cell.
I have multiple different such instance in a data.frame and do not know a priory the composition of the lists.
An example data.frame
require(plyr)
example<-data.frame(groups=letters)
example<-adply(example,
1,
function(x) data.frame(value=t(list(sample(LETTERS, 4)))))
groups value
1 a F, Y, N, X
2 b N, D, B, Y
3 c W, J, S, U
4 d I, S, N, A
5 e S, Z, Y, A
6 f O, R, J, A
From this, I wish to obtain
group F N ...
1 A TRUE TRUE ...
2 B FALSE TRUE ...
3 C FALSE FALSE ...
As per your request, here it is in function form
Example
myMatrix <- checkValues(example, makeMatrix=TRUE)
myMatrix
# A B C D E F ...
# a FALSE FALSE FALSE FALSE FALSE FALSE ...
# b FALSE FALSE FALSE FALSE FALSE TRUE ...
# c FALSE FALSE FALSE TRUE FALSE FALSE ...
# d FALSE TRUE FALSE TRUE FALSE FALSE ...
# e TRUE FALSE FALSE FALSE FALSE FALSE ...
# .
# .
# .
Function:
checkValues <- function(myDF, makeMatrix=FALSE, makeUnique=TRUE, sort=TRUE) {
# myDF should be a data frame containing columns `group` and `value`
# if `makeMatrix` is T, will convert the list into a long matrix
# `makeUnique` and `sort` only apply if `makeMatrix` is TRUE
# (otherwise, they are ignored)
res<-
lapply(myDF$value, function(L1)
t(sapply(myDF$value, function(L2) L1 %in% L2 ))
)
# Make the names purtty
names(res) <- myDF$group
for (i in 1:length(res))
dimnames(res[[i]]) <- list(myDF$group, myDF$value[[i]])
# convert the list to matrix
if (makeMatrix) {
res <- do.call(cbind, res)
# remove duplicates, if required
if (makeUnique)
res <- res[, !duplicated(res, MARGIN=2)]
# order columns, if required
if (sort)
res <- res[, order(colnames(res))]
}
return(res)
}