I am trying to troubleshoot my data, and check whether a certain name appears in two different columns in the same row (same observation):
df1 <- data.frame(
text1 = c("John Jay Jakson",
"John Jay Jakson",
"John Jay Jakson",
"John Jack Jakson"),
text2 = c("Jerry Jack Jameson",
"Jerry Jack Jameson",
"Jerry Jack Jameson",
"Jerry Jack Jameson"))
df2 <- data.frame(
names = c("John", "Jay", "Jackson", "Jerry", "Jack", "Jameson"))
The code I've come up with is the following
data.check = sapply(df2$names, function(x) (grepl(x, df1$text1) & grepl(x, df1$text2))==TRUE)
or alternatively:
which(sapply(df2$names, function(x) (grepl(x, df1$text1) & grepl(x, df1$text2))==TRUE))
But these are not the best ways to sift through the data. Instead, I want to have a new column in df1, df1$check, that holds a 1/0 based on whether each row has the same name in that row under df1$text1 and df1$text2.
I know that assigning this code to a new column won't work:
df1$check = sapply(df2$names, function(x) (grepl(x, df1$text1) & grepl(x, df1$text2))==TRUE)
it gives me false for the 4th row, which should be a true.
Any help appreciated, thank you.
The output from sapply
in the OP's code returns a logical matrix
.
> sapply(df2$names, function(x) (grepl(x, df1$text1) & grepl(x, df1$text2)))
John Jay Jackson Jerry Jack Jameson
[1,] FALSE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE TRUE FALSE
Each column of the matrix
should be converged to a single logical value to create a vector. We may wrap with rowSums
on the logical matrix and then convert the row wise sum to logical vector (> 0
) and coerce it back to binary (+
- TRUE
-> 1, FALSE
-> 0)
df1$check <- +(rowSums(sapply(df2$names, function(x)
(grepl(x, df1$text1) & grepl(x, df1$text2)))) > 0)
df1$check
[1] 0 0 0 1
Or another option is to loop with lapply
, return a list
and use Reduce
with |
to return a vector
df1$check <- +(Reduce(`|`, lapply(df2$names, function(x)
(grepl(x, df1$text1) & grepl(x, df1$text2)))))