Search code examples
rdataframesubsetgrepl

Removing rows surrounding a grepl pattern match in R


I would like to remove rows from a dataframe in R that contain a specific string (I can do this using grepl), but also the row directly below each pattern match. Removing the rows with the matching pattern seems simple enough using grepl:

df[!grepl("my_string",df$V1),]

The part I am stuck with is how to also remove the row below the row that containings the pattern that matches "my_string" in the above example.

Thank you for any suggestions anyone has!


Solution

  • Using grep you can get the row number where you find a pattern. Increment the row number by 1 and remove both the rows.

    inds <- grep("my_string",df$V1)
    result <- df[-unique(c(inds, inds + 1)), ]
    

    Using tidyverse -

    library(dplyr)
    library(stringr)
    
    result <- df %>%
      filter({
        inds <- str_detect("my_string", V1)
        !(inds | lag(inds, default = FALSE))
        })