Search code examples
rdataframegrepl

Count how many times each string from a column appear (no exact match) in another column in R


My data looks like this

df <- data.frame(id = c("p3", "p5", "p8", "p9", "p10", "p11"), pedi = c("p1/p2", "p3/p4", "p3/p5", "(p3/p4)/p5", "p5/p8", "p4/p10"))

I am trying this

id <- df$id 
for (i in length(id)) {
  df$id_in_pedi <- sum(grepl(i, df$pedi))
}


But it does not work. The result I am looking for is this:

df <- data.frame(id = c("p3", "p5", "p8", "p9", "p10", "p11"),
                 pedi = c("p1/p2", "p3/p4", "p3/p5", "(p3/p4)/p5", "p5/p8", "p4/p10"),
                 id_in_pedi = c(3,3,1,0,1,0))


Thanks


Solution

  • In tidyverse:

    library(tidyverse)
    df %>%
       mutate(id_in_pedi = str_count(toString(pedi), id))
    
       id       pedi id_in_pedi
    1  p3      p1/p2          3
    2  p5      p3/p4          3
    3  p8      p3/p5          1
    4  p9 (p3/p4)/p5          0
    5 p10      p5/p8          1
    6 p11     p4/p10          0
    

    in Base R: Using sapply:

    transform(df, id_in_pedi = colSums(sapply(id, grepl, pedi, USE.NAMES = FALSE)))
    
       id       pedi id_in_pedi
    1  p3      p1/p2          3
    2  p5      p3/p4          3
    3  p8      p3/p5          1
    4  p9 (p3/p4)/p5          0
    5 p10      p5/p8          1
    6 p11     p4/p10          0
    

    Using Vectorize:

    colSums(Vectorize(grepl)(df$id, list(df$pedi)))
     p3  p5  p8  p9 p10 p11 
      3   3   1   0   1   0