Search code examples
rdataframefiltermatchpartial

Filtering rows based on partial matching between a data frame and a vector


I have a data frame and want to filter it based on the partial match of names in the first column with the names in the vector.

nam <- c('mmu_mir-1-3p','mmu_mir-1-5p','mmu-mir-3-5p','mir-4','mmu-mir-6-3p') #factor
aa <- c('12854','36','5489','54485','2563') #numeric
df <- data.frame(nam,aa)

vector <- c('mir-1','mir-3','mir-6')

I need to have rows in the new data frame where names in df$nam are partially matching with the names in vector. So new_df should look like this.

new_nam <- c('mmu_mir-1-3p','mmu_mir-1-5p','mmu-mir-3-5p','mmu-mir-6-3p')  #factor
new_aa <- c('12854','36','5489','2563')  #numeric
new_df <- data.frame(new_nam,new_aa)

Solution

  • We can paste the elements of 'vector' into a single string collapsed by | and usse that in grepl or str_detect to filter the rows

    library(dplyr)
    library(stringr)
    df %>% 
       filter(str_detect(nam, str_c(vector, collapse="|")))
    #           nam    aa
    #1 mmu_mir-1-3p 12854
    #2 mmu_mir-1-5p    36
    #3 mmu-mir-3-5p  5489
    #4 mmu-mir-6-3p  2563
    

    In base R, this can be done with subset/grepl

    subset(df, grepl(paste(vector, collapse= "|"), nam))