Search code examples
rdplyrstringrtidyselect

Difference between using str_detect() and contains()?


I know it might be a silly question, but I was curious if there was any difference, I like more using str_detect because the syntax makes more sense in my brain.


Solution

  • Yes there are substantial differences. First, contains() is a "selection helper" that must be used within a (generally tidyverse) selecting function.

    So you cant work with vectors or use contains() as a standalone function - ie, you can't do:

    x <- c("Hello", "and", "welcome (example)") 
    
    tidyselect::contains("Hello", x)
    

    Or you get the error:

    Error: ! contains() must be used within a selecting function.

    Whereas stringr::str_detect can work with vectors and as a standalone function:

    stringr::str_detect(x, "Hello")
    

    Returns:

    [1]  TRUE FALSE FALSE
    

    Secondly, stringr::str_detect() allows for regex, and tidyselect::contains only looks for literal strings.

    So for example, the below works

    df <- data.frame(col1 = c("Hello", "and", "welcome (example)"))
    
    df %>% 
      select(contains("1"))
    
    #               col1
    # 1             Hello
    # 2               and
    # 3 welcome (example)
    

    But this does not:

    df %>% select(contains("\\d"))
    

    (\\d is the R regex for "any digit")

    Additionally, as noted by @abagail, contains looks at column names, not at the values stored within the columns. For instance, df %>% filter(contains("1")) worked above to return the column col1 (since there was a "1" in the column name). But trying to filter on the values that contain a certain pattern does not work:

    df %>% 
      filter(contains("Hello"))
    

    Returns the same error:

    Caused by error: ! contains() must be used within a selecting function.

    But you can filter on the values in the columns using stringr::str_detect():

    df %>% 
      filter(stringr::str_detect(col1, "Hello"))
    
    #    col1
    # 1 Hello
    

    Lastly, if you are looking for similar functions outside of stringr, since tidyselect::matches() will accept regex, @GregorThomas aptly points out in the comments,

    "tidyselect::matches is a much closer analog to str_detect() --though still as a selection helper is is only for use within a selecting function."

    str_detect is also equivalent to base R's grepl, though the orientation of the pattern and string are reversed (ie, str_detect(string, pattern) is equivalent to grepl(pattern, string)