Search code examples
regexregex-group

Stop regex search at optional pattern


I'm trying to make a Regex pattern that can pull a few elements from an email. The email may or may not be forwarded. If it is not forwarded, it will match this format:

-match one
-match two
-match three
-and a bunch of notes here, potentially with more than 1 line or newlines included 
and there may be hyphens in this text as well

If it is forwarded, it will match this format:

-match one
-match two
-match three
-and a bunch of notes here, potentially with more than 1 line or newlines included 
and there may be hyphens in this text as well

---------- Forwarded message ----------
From:....

I'm having trouble making a pattern that will work for both cases and will capture everything between the 4th dash and the line that starts "------Forwarded...."

Here is the pattern I came up with as a placeholder: \-\s?(.+)\s\-\s?(.+)\s\-\s?(.+)\s\-\s?([^[-]*). However, this does not work when the text after the 4th dash has hyphens in it because then it cuts off after it finds a hyphen.


Solution

  • One option could be matching the 3 lines and only the dash of the fourth line. Then capture in a group all lines that do not start with a dash.

    ^(?:-.*\n){3}-((?:.*\n(?!-).*)*)
    
    • ^ Start of string
    • (?:-.*\n){3} Match 3 line and a newline (Use (?:-.*\n)+ to match 1 or more lines)
    • - Match the fourth dash
    • ( Capture group 1
      • (?:.*\n(?!-).*)* Match all lines that do not start with a dash
    • ) Close group 1

    Regex demo

    You can also exclude matching ---------- Forwarded message if there can be no overlap

    ^(?:-.*\n){3}-((?:.*\n(?!-+ Forwarded message).*)*)
    

    Regex demo

    But see this example what all the matches can also be in that case.