Search code examples
rstringdelete-row

Delete rows with specific string from a datable


I have a data table consists of 3 columns and 1000+ rows generated from R. The first two columns are strings and the last characters are similar in some columns(they denote cell type). I want to filter out information only from non-similar rows.

AAA.aa BBB.aa 0.9
AAA.aa BBB.bb 0.8
CCC.cc DDD.cc 0.7
CCC.cc BBB.bb 0.8

I want my output as:

AAA.aa BBB.bb 0.8
CCC.cc BBB.bb 0.8

Any help would be highly appreciated.


Solution

  • Keep only the part of the string that you want to compare and remove using !=

    subset(df, sub('.*\\.', '', V1) != sub('.*\\.', '', V2))
    
    #      V1     V2  V3
    #2 AAA.aa BBB.bb 0.8
    #4 CCC.cc BBB.bb 0.8
    

    This can also be used in dplyr::filter

    dplyr::filter(df, sub('.*\\.', '', V1) != sub('.*\\.', '', V2))
    

    data

    df <- structure(list(V1 = c("AAA.aa", "AAA.aa", "CCC.cc", "CCC.cc"), 
    V2 = c("BBB.aa", "BBB.bb", "DDD.cc", "BBB.bb"), V3 = c(0.9, 
    0.8, 0.7, 0.8)), class = "data.frame", row.names = c(NA, -4L))