I am trying to pattern-match all my targeted tissues ("heart", "muscle", "kidney", "liver")
in a data frame (pasted below) and list the name of species that have all of the targeted tissues.
Data:
df <- read.csv(text =
"Species,Tissue
Human,Kr_liver_2
Human,Heart
Human,Liver_556
Human,Kr_Kidney_2
Human,Kr_Muscle_2
Human,Kr_Brain_2
Mouse,Brain
Mouse,Kr_liver_3
Mouse,Kr_liver_5
Mouse,Kr_liver_27")
I tried the approach below but I got an empty output, however, the desired output based on the data frame above should be 'Human' because it has all of the targetted tissues.
Tissue_check <- df %>%
group_by(Species) %>%
filter(all(grepl(paste(target_tissues, collapse = "|"), tolower(Tissue)))) %>%
pull(Species) %>%
unique()
How can I achieve this?
You can paste all elements of Tissue
column into one string, and detect if all of the target tissues are included in it.
library(dplyr)
target <- c("heart", "muscle", "kidney", "liver")
df %>%
filter(all(sapply(target, grepl, toString(Tissue), ignore.case = TRUE)),
.by = Species)
An alternative with stringr
:
library(stringr)
df %>%
filter(all(str_detect(toString(Tissue), fixed(target, ignore_case = TRUE))),
.by = Species)
# Species Tissue
# 1 Human Kr_liver_2
# 2 Human Heart
# 3 Human Liver_556
# 4 Human Kr_Kidney_2
# 5 Human Kr_Muscle_2
# 6 Human Kr_Brain_2