I'm trying to make a Regex pattern that can pull a few elements from an email. The email may or may not be forwarded. If it is not forwarded, it will match this format:
-match one
-match two
-match three
-and a bunch of notes here, potentially with more than 1 line or newlines included
and there may be hyphens in this text as well
If it is forwarded, it will match this format:
-match one
-match two
-match three
-and a bunch of notes here, potentially with more than 1 line or newlines included
and there may be hyphens in this text as well
---------- Forwarded message ----------
From:....
I'm having trouble making a pattern that will work for both cases and will capture everything between the 4th dash and the line that starts "------Forwarded...."
Here is the pattern I came up with as a placeholder: \-\s?(.+)\s\-\s?(.+)\s\-\s?(.+)\s\-\s?([^[-]*)
. However, this does not work when the text after the 4th dash has hyphens in it because then it cuts off after it finds a hyphen.
One option could be matching the 3 lines and only the dash of the fourth line. Then capture in a group all lines that do not start with a dash.
^(?:-.*\n){3}-((?:.*\n(?!-).*)*)
^
Start of string(?:-.*\n){3}
Match 3 line and a newline (Use (?:-.*\n)+
to match 1 or more lines)-
Match the fourth dash(
Capture group 1
(?:.*\n(?!-).*)*
Match all lines that do not start with a dash)
Close group 1You can also exclude matching ---------- Forwarded message
if there can be no overlap
^(?:-.*\n){3}-((?:.*\n(?!-+ Forwarded message).*)*)
But see this example what all the matches can also be in that case.