Search code examples
regexregex-lookaroundsregex-group

How To Capture Positive Lookahead


I am trying to figure out how to capture the positive lookahead group in the following regex:

(((Initial commit)|(Merge [^\r\n]+)|(((build|chore|ci|docs|feat|fix|perf|refactor|revert|style|test|BREAKING CHANGE)(\(\w+\))?!?: ([\w ]+))(\r|\n|\r\n){0,2}((?:\w|\s|\r|\n|\r\n)+)(?=(((\r|\n|\r\n){2}([\w-]+): (\w+))|$)))))

My sample dataset I am trying to match with is as follows:

#1

build(Breaking): la asdf asdf asdf

asdfasdf asdf asdf
asdf
asdf
asdf

asdf
asdf

asdf

aef asdf asdf

#2

build(Breaking): la asdf asdf asdf

asdfasdf asdf asdf
asdf
asdf
asdf

asdf
asdf

asdf

aef asdf asdf

asdf-asdf: asdf

I successfully capture all fields preceeding the positive lookahead of asdf-asdf: asdf, whether or not it is there, but for some reason, even if the positive look-ahead finds the asdf-asdf: asdf match, the capturing group doesn't seem to capture the asdf-asdf: asdf match.

What should I be doing in order to accomplish this goal, or what am I doing wrong?


Solution

  • Your regex string is very long, but your problem is essentially that your positive lookahead is not being captured, because positive lookaheads do not capture itself. A simpler example is bad (?=tea) which will not capture bad tea and only bad . However if you do bad (?=(tea))\1 it will indeed capture the entire string. Your correct regex string is

    (((Initial commit)|(Merge [^\r\n]+)|(((build|chore|ci|docs|feat|fix|perf|refactor|revert|style|test|BREAKING CHANGE)(\(\w+\))?!?: ([\w ]+))(\r|\n|\r\n){0,2}((?:\w|\s|\r|\n|\r\n)+)(?=(((\r|\n|\r\n){2}([\w-]+): (\w+))|$))\12)))
    

    You simply add \12 (or just replicate whatever string is inside the positive lookahead) after the lookahead itself.