Search code examples
rregexline-breaks

Removing certain regular expressions in r


I have a character string in which I would like to only remove the line breaks followed immediately by a lowercase letter. For example, my string might contain:

one line of text \r\n another line \r\nof text,

which would show up as:

one line of text

another line

of text.

In this example, I would only want to remove the second line break, so that the text would then read:

one line of text

another line of text

I know that the pattern is "\r\n[a-z]", and so the code should be something like

gsub("\r\n[a-z]","")

but I cannot come up with code that removes the line break while retaining the lowercase letter.

Thanks!


Solution

  • We can use a regex lookaround

    txtN <- gsub("\r\n(?=[a-z])", "", txt, perl = TRUE)
    cat(txtN, sep="\n")
    # one line of text 
    # another line of text,