Search code examples
.netregexperlwhitespacemultiline

Regular expression to match any vertical whitespace


Is there a regex pattern for .NET that will match any character that will result in multiple lines, i.e. any vertical whitespace character, like perl regex does with \v? In other words, is there a way to match \r (carriage return), \n (line feed), \v (vertical tab), and \f (form feed) as well as the Unicode characters U+0085 (next line), U+2028 (line separator), and U+2029 (paragraph separator) and any other characters I'm not aware of that might result in more than one line?

I'm writing some validation code in .NET that will fail if a user has provided input text that contains more than one line. In most cases, that means I just have to check for \r and \n. However, I know there is a multitude of other vertical whitespace characters.

I know .NET regex differs from perl regex, most importantly in that \v in .NET matches "vertical tab" whereas it matches "vertical whitespace" in perl regex.


Solution

  • As you say, the Perl character class \v matches [\x0A-\x0D] (linefeed, vertical tab, form feed and carriage-return (although I would dispute that CR is vertical white space)) in addition to the non-ASCII code points [\x{2028}\x{2029}] (line separator and paragraph separator).

    You can hand-build this character class in .NET like this

    [\u0A-\u0D\u2028\u2029]