Search code examples

Remove specific string or blank member from character vector

I am scraping to retrieve header and details from each page. But along with header and details a telephone number and a blank string is coming in the retrieved list for every page.

[1] "See our simple, animated definitions of types of corruption and the ways to challenge it."
[2] "Judiciary - Commenting on Justice Bean’s sentencing in the BAE Systems’ Tanzania case, Transparency International UK welcomed the Judge’s stringent remarks concerning BAE Systems’ past conduct."
[3] " "
[4] "+49 30 3438 20 666"

I have tried with following codes but they didn't worked.

html %>% str_remove('+49 30 3438 20 666') %>% str_remove(' ').

How these elements can be removed?


  • In case you want to drop all lines that start with a + and end with a number:

    dd <- c(
     "See our simple, animated definitions of types of corruption and the ways to challenge it."
    , "Judiciary - Commenting on Justice Bean’s sentencing in the BAE Systems’ Tanzania case, Transparency International UK welcomed the Judge’s stringent remarks concerning BAE Systems’ past conduct."
    ," "
    , "+49 30 3438 20 666")
    c <- dd[!grepl("^\\+.*\\d*$",dd)]

    You can also use \\s (one empty space) and \\d{2} (2 numbers) to have an exact match, to be on the safe side, if all numbers have the same format. Note that you can also use it in str_remove, with the end result beig an empty string. grep instead returns as logical vector that subsets your string.

    If you want to delete also all empty lines


    Note that you can do both at the same time by using "|":


    You can get familiar with regex here: