Extract url from text file next to certain string

I have a large text file that contains something like:

View this email in your browser (https://us15.campaign-archive.com/?e=3D1460&u=3Df6e2bb1612577510b&id=3D2c8be)

Sometimes, part of the URL goes onto the next line.

I simply need to extract that URL using PowerShell, without the brackets (parentheses), so that I can download it as a HTML file.

I've tried doing this in batch which I'm most familiar with, but it's proving impossible and seems this would be possible in PowerShell.

Solution

The following uses regex-based operators and .NET APIs.

In both solutions, -replace '\r?\n' is used to remove any embedded newlines (line breaks) from the URL(s) found, using the -replace operator (\r?\n is a regex that matches both Windows-format CRLF and Unix-format LF-only newlines).

If you only need one or the first URL, use the -match operator, which - if it returns $true - reports what was matched in the automatic $Matches variable variable.

# Sample multi-line input string.
# To read such a string from a file, use, e.g.:
#     $str = Get-Content -Raw file.txt
$str = @'
  Initial text.

  View this email in your browser (https://us15.campaign-archive.com/?e=3D1460&u=3Df6e2b
b1612577510b&id=3D2c8be)

  More text.
'@

# Find the (first) embedded URL...
if ($str -match '(?<=\()https?://[^)]+') {
  # ... remove any line breaks from it, and output the result.
  $Matches.0 -replace '\r?\n'
}

If you need all (or a fixed count) of matches, direct use of the System.Text.RegularExpressions.Regex.Matches .NET API is required:

# Extract *all* URLs and remove any embedded line breaks from each
[regex]::Matches(
  $str, 
  '(?<=\()https?://[^)]+'
).Value -replace '\r?\n'

For an explanation of the first regex and the ability to experiment with it, see this regex101.com page.