Ok so I know there are some Regex questions out here on lookahead and lookbehind, but I haven't found some anwsers, to my interior questions, that I can easily relate to (...oh well).
So here's how I understand Regex lookahead and lookbehind!
LA/LB preceding main Regex
(?=IF_YOU_FIND_WHAT_IS_HERE)START_MATCHING_WHAT_IS_HERE (?!IF_YOU_DO_NOT_FIND_WHAT_IS_HERE)START_MATCHING_WHAT_IS_HERE
LA/LB succeeding main Regex
START_MATCHING_WHAT_IS_HERE(?=UNTIL_THIS IS_NOT TRUE) START_MATCHING_WHAT_IS_HERE(?!UNTIL_THIS IS_NOT TRUE)
Ok so for the second part ( succeeding ), I'm really not sure and I would appreciate some rewriting of the above notations or some thumbs up for my excellent understanding (oh yeah).
So back on earth, as I understand it, after each character it matches in the "main" Regex...
Let's look at this Regex
(?<=REGEX_1)(?<!REGEX_2((MAIN_REGEX(?<!REGEX_3))(?=REGEX_4)))
My strategy in approching this would be, well, in some cases, we could combine REGEX_1 and REGEX_2. If that was the case, we would have :
(?<=REGEX_C)((MAIN_REGEX(?<!REGEX_3))(?=REGEX_4))
C for : Combined
Essentially, what I understand is that :
I have no clue, if what I wrote is accurate haha. It's to0 messy when I want to try it out. Most of the time I succeed by trial and errors, but I would like to have somes clarifications so I can get it on my first try. Boom
Thanks for your replies!
Being successful in understanding assertions is that they all involve
looking in a direction from BETWEEN characters, not at, on, before, later
or anything else you can think of.
Since they are between characters, they have a priority for analysis by the
regex engine.
The priority for character matching is from left to right.
So is the reading order of a regex.
The priority for assertions are:
An assertion before something is checked first.
An assertion after something is checked last.
And, the position between characters is where it's checked.
You have to imagine yourself at that position when you write the assertion.
Update with more explanation
Usually, the best way to get better used to assertions is to look at examples.
This is your template expression as I see it.
(?<= REGEX_1 ) # Here is Between a character, lookbehind for a certain set of chars
(?<! REGEX_2 ) # At the same place, lookbehind that a char subset is not there;
( # (1 start)
MAIN_REGEX # Some data to match
) # (1 end)
(?<! REGEX_3 ) # Here is Between the last char matched in group 1
# and the next character yet to be matched.
# Look behind at the last char matched in group 1
# and make sure it is within a set of chars.
(?= REGEX_4 ) # At the same place, look ahead that a subset of chars are there
Here is something more concrete.
This is how a regex would look for the word boundary construct \b
.
The word boundary actually only exists between characters.
It looks in both directions in two different ways to satisfy itself.
Study this for a while.
(?: # Cluster start
(?: # -------
^ # Beginning of string anchor
| # or,
(?<= [^a-zA-Z0-9_] ) # Lookbehind assertion for a char that is NOT a word
) # -------
(?= [a-zA-Z0-9_] ) # Lookahead assertion for a char that is IS a word
| # or,
(?<= [a-zA-Z0-9_] ) # Lookbehind assertion for a char that is IS a word
(?: # -------
$ # End of string anchor
| # or,
(?= [^a-zA-Z0-9_] ) # Lookahead assertion for a char that is NOT a word
) # -------
) # Cluster end