Remove rows that contain different letters or missing data for two columns

I'm analyzing a big dataset in R studio and I am not very experienced in programming. I would want to remove the rows that contain different letters for columns CONSENSUSMAP and SVEVOMAP. Also, if missing data for column "CONSENSUSMAP".

I write this table as an example:

CLONEID | CONSENSUSMAP| SVEVOMAP
1228104 |      NA     |    chr1A
2277691 |      NA     |    chr1A
2277607 |      1A     |    chr1A
1E+08   |      NA     |    chr1A
1229677 |      1B     |    chr1A
1126457 |      7B     |    chr7B

I would like to obtain the following output:

CLONEID | CONSENSUSMAP| SVEVOMAP
2277607 |       1A    |    chr1A
1126457 |       7B    |    chr7B

I tried some codes but none of them fits these specific conditions. Any suggestions?

Solution

The following dplyr solution will do what the question asks for.

library(dplyr)

df1 %>%
  filter(!is.na(CONSENSUSMAP)) %>%
  mutate(newcol = sub("^[^[:digit:]]*(\\d+.*$)", "\\1", SVEVOMAP)) %>%
  filter(CONSENSUSMAP == newcol) %>%
  select(-newcol)
#  CLONEID CONSENSUSMAP SVEVOMAP
#1 2277607           1A    chr1A
#2 1126457           7B    chr7B

Edit.

Here are two other ways, both with dplyr, the second one uses package stringr.

df1 %>%
  filter(!is.na(CONSENSUSMAP)) %>%
  rowwise() %>%
  filter(grepl(CONSENSUSMAP, SVEVOMAP))
#Source: local data frame [2 x 3]
#Groups: <by row>
#
## A tibble: 2 x 3
#  CLONEID CONSENSUSMAP SVEVOMAP
#  <chr>   <chr>        <chr>   
#1 2277607 1A           chr1A   
#2 1126457 7B           chr7B   


df1 %>%
  filter(!is.na(CONSENSUSMAP)) %>%
  filter(stringr::str_detect(SVEVOMAP, CONSENSUSMAP))
#  CLONEID CONSENSUSMAP SVEVOMAP
#1 2277607           1A    chr1A
#2 1126457           7B    chr7B

Data.

df1 <-
structure(list(CLONEID = c("1228104", "2277691", "2277607", "1e+08", 
"1229677", "1126457"), CONSENSUSMAP = c(NA, NA, "1A", NA, "1B", 
"7B"), SVEVOMAP = c("chr1A", "chr1A", "chr1A", "chr1A", "chr1A", 
"chr7B")), row.names = c(NA, -6L), class = "data.frame")