Given a dataframe of types and values like so:
topic | keyword |
---|---|
cheese | cheddar |
meat | beef |
meat | chicken |
cheese | swiss |
bread | focaccia |
bread | sourdough |
cheese | gouda |
My aim is to make a set of dynamic regexs based on the type, but I don't know how to make the variable names from the types. I can do this individually like so:
fn_get_topic_regex <- function(targettopic,df)
{
filter_df <- df |>
filter(topic == targettopic)
regex <- paste(filter_df$keyword, collapse = "|")
}
and do things like:
cheese_regex <- fn_get_topic_regex("cheese",df)
But what I'd like to be able to do is build all these regexes automatically without having to define each one.
The intended output would be something like:
cheese_regex: "cheddar|swiss|gouda"
bread_regex: "focaccia|sourdough"
meat_regex: "beef|chicken"
Where the start of the variable name is the distinct topic.
What's the best way to do that without defining each regex individually by hand?
You can use dplyr
's group_by()
and summarise()
df %>%
group_by(topic) %>%
summarise(regex = paste(keyword, collapse = "|"))
# A tibble: 3 × 2
topic regex
<chr> <chr>
1 bread focaccia|sourdough
2 cheese cheddar|swiss|gouda
3 meat beef|chicken
Or you can apply your function to every unique value in df$topic
:
map_chr(unique(df$topic) %>% setNames(paste0(., "_regex")),
fn_get_topic_regex, df = df)
cheese_regex meat_regex bread_regex
"cheddar|swiss|gouda" "beef|chicken" "focaccia|sourdough"
Just remember to add return(regex)
to the end of your function, or not to assign the last line to a variable at all. I would even put everything in a single pipe chain:
fn_get_topic_regex <- function(targettopic,df)
{
df |>
filter(topic == targettopic) |>
pull(keyword) |>
paste(collapse = "|")
}