Search code examples
rdataframecomparesubsetstring-matching

R subsetting dataframe on partial string match or separating or split


I am attempting to subset a dataframe by a partial string match. split and compare may also work since the string can be split by " | " I believe that I used %in% in a past similar case but it does not work for this. Any suggestions?

df <- read.table(text="
col1    cOL2      
1   '2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.1.47'
2   '2.16.840.1.113883.10.20.22.4.4 | 2.16.840.1.113883.10.20.22.4.4 | 2.16.840.1.113883.10.20.22.4.64'
3   '2.16.840.1.113883.10.20.22.4.64 | 2.16.840.1.113883.10.20.22.4.78 | 2.16.840.1.113883.10.20.1.47'
4    '2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.1.47'
5   '2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.1.47'
", header=T, stringsAsFactors=FALSE)
df[which(df$cOL2 == 1 & df$cOL2 %in% '2.16.840.1.113883.10.20.22.4.19' ),]

Solution

  • Using base R functions, you could do:

    subset(df, grepl('2.16.840.1.113883.10.20.22.4.19', cOL2))
      col1                                                                                             cOL2
    1    1 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.1.47
    4    4 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.1.47
    5    5 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.22.4.19 | 2.16.840.1.113883.10.20.1.47