Search code examples
rlistboolean-logicstrsplit

How to transform a list (strsplit output) into logical data frame (according to column names) in R


this is my first post and obviously, I do not have programming experience.

Problem:

I have a list of 200 character vectors, ranging from 0 to 7 elements each: (this list is the output of the strsplit function).

>input

> [[1]]
>> [1] "foo" "bar" "norf"
> [[2]]
>> [1] "norf"
> [[3]]
>> NA
.....
> [[200]]
>> [1] "hello" "norf"

I also have a character string of all potential character strings in input:

possible_strings <- c("foo","bar","hello",...)

I want to convert it into a data frame (or similar object that gets the job done) of the following format:

> res
        foo   bar   norf  hello
[1,  ]  TRUE  TRUE  TRUE  FALSE
[2,  ]  FALSE FALSE TRUE  FALSE
[3,  ]  FALSE FALSE FALSE FALSE
[...]
[200,]  FALSE FALSE TRUE  TRUE

I tried very extensively to convert it and the furthest I got was a data frame with all possible strings as column names that had the character strings in all rows, filled with NAs (I used rbind.fill in the process).

Any help would be greatly appreciated,

Thanks!


Solution

  • In your original question, you say you'd like the result to be a data frame, but the result, res, you show is actually a matrix. Therefore, my first result below is a matrix, and then I convert it to a data frame with as.data.frame().

    This can be done fairly easily with sapply() and %in%. sapply() goes through list one element at a time and applies the function %in% on each element, looking for the elements of possStr and returning a logical result.

    > input <- list(c("foo", "bar", "norf"), "norf", NA, c("hello", "norf"))
    > possStr <- c("foo", "bar", "norf", "hello")
    
    > d <- t(sapply(input, function(x) possStr %in% x ))
    > colnames(d) <- possStr 
    > d                                       ## in matrix form
    #        foo   bar  norf hello
    # [1,]  TRUE  TRUE  TRUE FALSE
    # [2,] FALSE FALSE  TRUE FALSE
    # [3,] FALSE FALSE FALSE FALSE
    # [4,] FALSE FALSE  TRUE  TRUE
    
    > as.data.frame(d)                        ## convert to data frame
    #     foo   bar  norf hello
    # 1  TRUE  TRUE  TRUE FALSE
    # 2 FALSE FALSE  TRUE FALSE
    # 3 FALSE FALSE FALSE FALSE
    # 4 FALSE FALSE  TRUE  TRUE