A string (extracted from an Outlook email message body.innerText) contains embedded newlines. How can I split this into an array of strings?
I would expect this example string to be split into an array of two (2) items. Instead, it becomes an array of three (3) items with a blank line in the middle.
PS C:\src\t> ("This is`r`na string.".Split([Environment]::NewLine)) | % { $_ }
This is
a string.
PS C:\src\t> "This is `r`na string.".Split([Environment]::NewLine) | Out-String | Format-Hex
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 54 68 69 73 20 69 73 20 0D 0A 0D 0A 61 20 73 74 This is ....a st
00000010 72 69 6E 67 2E 0D 0A ring...
To treat a CRLF sequence as a whole as the separator, it's simpler to use the -split
operator, which is regex-based:
PS> "This is `r`n`r`n a string." -split '\r?\n'
This is
a string.
Note:
\r?\n
matches both CRLF (Windows-style) and LF-only (Unix-style) newlines; use \r\n
if you really only want to match CRLF sequences.
'...'
), so as to pass the string containing the regex as-is through to the .NET regex engine; the regex engine uses \
as the escape character; hence the use of \r
and \n
.PowerShell's -split
operator is a generally superior alternative to the [string]
.NET type's .Split()
method - see this answer.
As for what you tried:
The separator argument, [Environment]::NewLine
, on Windows is the string "`r`n"
, i.e. a CRLF sequence.
In PowerShell [Core] v6+, your command does work, because this string as a whole is considered the separator.
In Windows PowerShell, as Steven points out in his helpful answer, the individual characters - CR and LF separately are considered separators, resulting in an extra, empty element - the empty string between the CR and the LF - in the result array.
This change in behavior happened outside of PowerShell's control: .NET Core introduced a new .Split()
method overload with a [string]
-typed separator parameter, which PowerShell's overload-resolution algorithm now selects over the older overload with the [char[]]
-typed parameter.
Avoiding such unavoidable (albeit rare) inadvertent behavioral changes is another good reason to prefer the PowerShell-native -split
operator over the .NET [string]
type's .Split()
method.