Search code examples
rregextabscompressionline-breaks

Compressing multiple line breaks \n and tabs \t from a string in R


I tried to use

gsub('(\t\\n)+','\n',.) 

function to compress multiple \n and \t into only \n, but it didn't work.

I'm kinda confused by regex, so can anyone help me? Please find the R console screenshot below:

R console


Solution

  • You can use

    gsub("\t*\n[\t\n]*", "\n", x)
    

    This will replace all sequences of tabs and newlines where one newline char is obligatory with a single newline (LF) char.

    See an R demo online:

    x <- "A\tB\t\n\n\n\nD\tF"
    gsub("\t*\n[\t\n]*", "\n", x)
    ## => [1] "A\tB\nD\tF"
    

    Details:

    • \t* - zero or more tabs
    • \n - an LF, newline char
    • [\t\n]* - zero or more TAB or LF chars.

    If you need to incluse CR as an optional char, use

    gsub("[\t\r]*\n[\t\r\n]*", "\n", x)