Search code examples
regexregex-groupregex-negationregex-greedy

Regex to consume between two groups where the second group is optional


I have the following strings:

Sally: Hello there #line:34de2f
Bob: How are you today?

These strings have three parts to them...

  • The "name"; Sally: and Bob:
  • The "text"; Hello there and How are you today?
  • An optional "line identifier"; #line:34de2f

I want to grab the "text" between the "name" and the optional "line identifier" using a regex.

This seems like what negative lookaheads are for:

(?<=:).*?(?!#line:.*)$

But this still captures the "line identifier".

The following works, but I do not want to actually capture the "line identifier":

(?<=:).*?(#line:.*)?$

Solution

  • You may try using

    (?<=:\s).*?(?=\s*#line:.*|$)
    

    See this regex demo. Details:

    • (?<=:\s) - a location immediately preceded with : and a whitespace
    • .*? - any 0 or more chars other than line break chars, as few as possible
    • (?=\s*#line:.*|$) - a location immediately followed with 0+ whitespaces, #line: string or end of string.

    You may also use

    :\s*(.*?)(?:\s*#line:.*)?$
    

    See the regex demo. Get the contents in Group 1.

    Details

    • :\s* - a colon and then 0 or more whitespaces
    • (.*?) - Capturing group #1: any zero or more chars other than line break chars, as few as possible
    • (?:\s*#line:.*)? - an optional sequence of
      • \s* - 0+ whitespaces
      • #line: - a literal #line: string
      • .* - any zero or more chars other than line break chars, as many as possible
    • $ - end of string.