Search code examples
rgrepl

matches patterns in vector with strings in data frame


I have a data frame that contains two types cols and vector with names. How select some rows in data frame matches with vector strings.

name = c("p4@HPS1", "p7@HPS2", "p4@HPS3", "p7@HPS4", "p7@HPS5", "p9@HPS6", "p11@HPS7", "p10@HPS8", "p15@HPS9")
expression = c(118.84, 90.04, 106.6, 104.99, 93.2, 66.84, 90.02, 108.03, 111.83)
dataset <- as.data.frame(cbind(name, expression))
nam <- c("HPS5", "HPS6", "HPS9", "HPS2")

The function should return date frame only for the specified lines I try dataset[mapply(grepl,nam,dataset$name)] but it didn't work


Solution

  • We can use paste with collapse on the 'nam', use it as pattern argument in grep, get the index and subset the 'dataset'

    dataset[grep(paste(nam, collapse="|"), dataset$name),]
    

    If we are using the OP's code, wrap the 'name' column inside a list or else the mapply will go through individual elements of 'name' and as the number elements are not the same in 'name' and 'nam', this will throw a warning about the longer argument not a multiple of length of shorter. The mapply will return a logical matrix from which we take the rowSums and check whether it is greater than 0 to get a logical vector for subsetting the rows.

    dataset[rowSums(mapply(grepl, nam, list(dataset$name)))>0,]