I want to filter a dataframe using dplyr contains() and filter. Must be simple, right? The examples I've seen use base R grepl which sort of defeats the object. Here's a simple dataframe:
site_type <- c('Urban','Rural','Rural Background','Urban Background','Roadside','Kerbside')
df <- data.frame(row_id, site_type)
df <- as.tibble(df)
df
Now I want to filter the dataframe by all rows where site.type contains the string background. I can find the string directly if I know the unique values of site_type:
filtered_df <- filter(df, site_type == 'Urban Background')
But I want to do something like:
filtered_df <- filter(df, site_type(contains('background', match_case = False)))
Any ideas how to do that? Can dplyr helper contains
only be used with columns and not rows?
The contains
function in dplyr is a select helper. It's purpose is to help when using the select
function, and the select
function is focused on selecting columns not rows. See documentation here.
filter
is the intended mechanism for selecting rows. The function you are probably looking for is grepl
which does pattern matching for text.
So the solution you are looking for is probably:
filtered_df <- filter(df, grepl("background", site_type, ignore.case = TRUE))
I suspect that contains
is mostly a wrapper applying grepl
to the column names. So the logic is very similar.
References: