Search code examples
regexlanguage-agnosticline-breaks

Match linebreaks - \n or \r\n?


While writing this answer, I had to match exclusively on linebreaks instead of using the s-flag (dotall - dot matches linebreaks).

The sites usually used to test regular expressions behave differently when trying to match on \n or \r\n.

I noticed

  • Regex101 matches linebreaks only on \n
    (example - delete \r and it matches)

  • RegExr matches linebreaks neither on \n nor on \r\n
    and I can't find something to make it match a linebreak, except for the m-flag and \s
    (example)

  • Debuggex behaves even more different:
    in this example it matches only on \r\n, while
    here it only matches on \n, with the same flags and engine specified

I'm fully aware of the m-flag (multiline - makes ^ match the start and $ the end of a line), but sometimes this is not an option. Same with \s, as it matches tabs and spaces, too.

My thought to use the unicode newline character (\u0085) wasn't successful, so:

  1. Is there a failsafe way to integrate the match on a linebreak (preferably regardless of the language used) into a regular expression?
  2. Why do the above mentioned sites behave differently (especially Debuggex, matching once only on \n and once only on \r\n)?

Solution

  • I will answer in the opposite direction.

    1. For a full explanation about \r and \n I have to refer to this question, which is far more complete than I will post here: Difference between \n and \r?

    Long story short, Linux uses \n for a new-line, Windows \r\n and old Macs \r. So there are multiple ways to write a newline. Your second tool (RegExr) does for example match on the single \r.

    1. [\r\n]+ as Ilya suggested will work, but will also match multiple consecutive new-lines. (\r\n|\r|\n) is more correct.