Search code examples
rdplyrstringr

Count how many times strings from one data frame appear to another data frame in R dplyr


I have two data frames that look like this:

df1 <- data.frame(reference=c("cat","dog"))
print(df1)
#>   reference
#> 1       cat
#> 2       dog
df2 <- data.frame(data=c("cat","car","catt","cart","dog","dog","pitbull"))
print(df2)
#>      data
#> 1     cat
#> 2     car
#> 3    catt
#> 4    cart
#> 5     dog
#> 6     dog
#> 7 pitbull

Created on 2021-12-29 by the reprex package (v2.0.1)

I want to find how many times the words cat and dog from the df1 exist in df2. I want my data to look like this

animals   n
cat       1
dog       2

Any help or guidance is appreciated. My reference list is huge. I tried to grep each one of them but ll take me time.

Thank you for your time. Happy holidays


Solution

  • A possible solution, tidyverse-based:

    library(tidyverse)
    
    df1 <- data.frame(reference=c("cat","dog"))
    df2 <- data.frame(data=c("cat","car","catt","cart","dog","dog","pitbull"))
    
    df1 %>% 
      group_by(animal = reference) %>% 
      summarise(n = sum(reference == df2$data), .groups = "drop")
    
    #> # A tibble: 2 × 2
    #>   animal     n
    #>   <chr>  <int>
    #> 1 cat        1
    #> 2 dog        2