RegEx:
<span style='.+?'>TheTextToFind</span>
HTML:
<span style='font-size:11.0pt;'>DON'T_WANT_THIS_MATCHED <span style='font-size:18.0pt;'>TheTextToFind</span></span>
Why does the match include this?
<span style='font-size:11.0pt;'>DON'T_WANT_THIS_MATCHED
The regex engine always find the left-most match. That's why you get
<span style='font-size:11.0pt;'>DON'T_WANT_THIS_MATCHED <span style='font-size:18.0pt;'>TheTextToFind</span>
as a match. (Basically the whole input, sans the last </span>
).
To steer the engine in the correct direction, if we assume that >
doesn't appear directly in the attribute, the following regex will match what you want.
<span style='[^>]+'>TheTextToFind</span>
This regex matches what you want, since with the above assumption, [^>]+
can't match outside a tag.
However, I hope that you are not doing this as part of a program that extracts information out of a HTML page. Use HTML parser for that purpose.
To understand why the regex matches as such, you need to understand that .+?
will try to backtracks so that it can find a match for the sequel ('>TheTextToFind</span>
).
# Matching .+?
# Since +? is lazy, it matches . once (to fulfill the minimum repetition), and
# increase the number of repetition if the sequel fails to match
<span style='f # FAIL. Can't match closing '
<span style='fo # FAIL. Can't match closing '
...
<span style='font-size:11.0pt; # PROCEED. But FAIL later, since can't match T in The
<span style='font-size:11.0pt;' # FAIL. Can't match closing '
...
<span style='font-size:11.0pt;'>DON' # PROCEED. But FAIL later, since can't match closing >
...
<span style='font-size:11.0pt;'>DON'T_WANT_THIS_MATCHED <span style='
# PROCEED. But FAIL later, since can't match closing >
...
<span style='font-size:11.0pt;'>DON'T_WANT_THIS_MATCHED <span style='font-size:18.0pt;
# PROCEED. MATCH FOUND.
As you can see, .+?
attempts with increasing length and matches font-size:11.0pt;'>DON'T_WANT_THIS_MATCHED <span style='font-size:18.0pt;
, which allows the sequel '>TheTextToFind</span>
to be matched.