I have a data table consists of 3 columns and 1000+ rows generated from R. The first two columns are strings and the last characters are similar in some columns(they denote cell type). I want to filter out information only from non-similar rows.
AAA.aa BBB.aa 0.9
AAA.aa BBB.bb 0.8
CCC.cc DDD.cc 0.7
CCC.cc BBB.bb 0.8
I want my output as:
AAA.aa BBB.bb 0.8
CCC.cc BBB.bb 0.8
Any help would be highly appreciated.
Keep only the part of the string that you want to compare and remove using !=
subset(df, sub('.*\\.', '', V1) != sub('.*\\.', '', V2))
# V1 V2 V3
#2 AAA.aa BBB.bb 0.8
#4 CCC.cc BBB.bb 0.8
This can also be used in dplyr::filter
dplyr::filter(df, sub('.*\\.', '', V1) != sub('.*\\.', '', V2))
data
df <- structure(list(V1 = c("AAA.aa", "AAA.aa", "CCC.cc", "CCC.cc"),
V2 = c("BBB.aa", "BBB.bb", "DDD.cc", "BBB.bb"), V3 = c(0.9,
0.8, 0.7, 0.8)), class = "data.frame", row.names = c(NA, -4L))