Search code examples
rdataframereduction

Get dummy (T/F) variables from list embedded within data frame


I have a data.frame in which cells contain a list of terms.

I wish to produce a new variable for each term found in that list indicating wether the term is present or not in that given cell.

I have multiple different such instance in a data.frame and do not know a priory the composition of the lists.

An example data.frame

require(plyr)

example<-data.frame(groups=letters)

example<-adply(example,
               1,
               function(x) data.frame(value=t(list(sample(LETTERS, 4)))))

    groups      value
1      a F, Y, N, X
2      b N, D, B, Y
3      c W, J, S, U
4      d I, S, N, A
5      e S, Z, Y, A
6      f O, R, J, A

From this, I wish to obtain

group     F     N     ...
1     A  TRUE  TRUE  ...
2     B FALSE  TRUE  ...
3     C FALSE FALSE  ...

Solution

  • As per your request, here it is in function form

    Example

    myMatrix <- checkValues(example, makeMatrix=TRUE)
    myMatrix
    
    #        A     B     C     D     E     F  ...
    #   a FALSE FALSE FALSE FALSE FALSE FALSE ...
    #   b FALSE FALSE FALSE FALSE FALSE  TRUE ...
    #   c FALSE FALSE FALSE  TRUE FALSE FALSE ...
    #   d FALSE  TRUE FALSE  TRUE FALSE FALSE ...
    #   e  TRUE FALSE FALSE FALSE FALSE FALSE ...
    #   .
    #   .
    #   .
    


    Function:

    checkValues <- function(myDF, makeMatrix=FALSE, makeUnique=TRUE, sort=TRUE)  {
      # myDF should be a data frame containing columns `group` and `value`
      # if `makeMatrix` is T, will convert the list into a long matrix
      # `makeUnique`  and  `sort` only apply if `makeMatrix` is TRUE
      #   (otherwise, they are ignored)
    
      res<- 
      lapply(myDF$value, function(L1) 
          t(sapply(myDF$value, function(L2) L1 %in% L2 ))
      )
    
      # Make the names purtty 
      names(res) <- myDF$group
    
      for (i in 1:length(res))
          dimnames(res[[i]]) <- list(myDF$group, myDF$value[[i]])
    
      # convert the list to matrix
      if (makeMatrix)  {  
        res <- do.call(cbind, res)
    
        # remove duplicates, if required
        if (makeUnique) 
          res <- res[, !duplicated(res, MARGIN=2)]
    
        # order columns, if required
        if (sort)
          res <- res[, order(colnames(res))]
      }
    
      return(res)
    }