I have a data frame, df
, having one column of different names. I have variable data frames, e.g. search_df
or search_df1
containing search words which I would like to search via regex in the name column.
If the word has been found write it into a new column, e.g. df_final$which_word_search_df
.
If more than one word has been found I would like to paste the results together.
The result should look like df_final
.
# load packages
pacman::p_load(tidyverse)
# words I would like to search for
search_df <- data.frame(search_words = c("apple", "peach"))
search_df1 <- data.frame(search_words = c("strawberry", "peach", "banana"))
# data frame which is the basis for my search
df <- data.frame(name = c("apple123", "applepeach", "peachtime", "peachab", "bananarrr", "bananaxy"))
# how I expect the final result to look like
df_final <- data.frame(name = c("apple123", "applepeach", "peachtime", "peachab", "bananarrr", "bananaxy"),
which_word_search_df = c("apple", "apple; peach", "peach", "peach", NA, NA),
which_word_search_df1 = c(NA, NA, "peach", "peach", "banana", "banana"))
That is my current solution but as you can see it is not dynamic. I type in manually every search word instead of automatically going through all the search words.
df_trial <- df %>%
mutate(which_search_word_trial = ifelse(grepl("apple", name, ignore.case = T), "apple", ""),
which_search_word_trial = ifelse(grepl("peach", name, ignore.case = T),
paste(which_search_word_trial, "peach", sep = ";"), which_search_word_trial)
)
The example I am sharing is just a minimal one. For the actual use case df
will have ~200k rows and my search_df
will have ~1k rows.
We can do the following.
library(dplyr)
library(stringr)
df %>%
mutate(which_word_search_df = str_extract_all(name,str_c(search_df$search_words, collapse = '|')),
which_word_search_df1 = str_extract_all(name, str_c(search_df1$search_words, collapse = '|')))
# name which_word_search_df which_word_search_df1
# 1 apple123 apple
# 2 applepeach apple, peach peach
# 3 peachtime peach peach
# 4 peachab peach peach
# 5 bananarrr banana
# 6 bananaxy banana