this is my first post and obviously, I do not have programming experience.
Problem:
I have a list of 200 character vectors, ranging from 0 to 7 elements each: (this list is the output of the strsplit function).
>input
> [[1]]
>> [1] "foo" "bar" "norf"
> [[2]]
>> [1] "norf"
> [[3]]
>> NA
.....
> [[200]]
>> [1] "hello" "norf"
I also have a character string of all potential character strings in input:
possible_strings <- c("foo","bar","hello",...)
I want to convert it into a data frame (or similar object that gets the job done) of the following format:
> res
foo bar norf hello
[1, ] TRUE TRUE TRUE FALSE
[2, ] FALSE FALSE TRUE FALSE
[3, ] FALSE FALSE FALSE FALSE
[...]
[200,] FALSE FALSE TRUE TRUE
I tried very extensively to convert it and the furthest I got was a data frame with all possible strings as column names that had the character strings in all rows, filled with NAs (I used rbind.fill in the process).
Any help would be greatly appreciated,
Thanks!
In your original question, you say you'd like the result to be a data frame, but the result, res
, you show is actually a matrix. Therefore, my first result below is a matrix, and then I convert it to a data frame with as.data.frame()
.
This can be done fairly easily with sapply()
and %in%
. sapply()
goes through list one element at a time and applies the function %in%
on each element, looking for the elements of possStr
and returning a logical result.
> input <- list(c("foo", "bar", "norf"), "norf", NA, c("hello", "norf"))
> possStr <- c("foo", "bar", "norf", "hello")
> d <- t(sapply(input, function(x) possStr %in% x ))
> colnames(d) <- possStr
> d ## in matrix form
# foo bar norf hello
# [1,] TRUE TRUE TRUE FALSE
# [2,] FALSE FALSE TRUE FALSE
# [3,] FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE TRUE TRUE
> as.data.frame(d) ## convert to data frame
# foo bar norf hello
# 1 TRUE TRUE TRUE FALSE
# 2 FALSE FALSE TRUE FALSE
# 3 FALSE FALSE FALSE FALSE
# 4 FALSE FALSE TRUE TRUE