Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
I have a string which is html like this
<html>
<div>
<p>this is sample content</p>
</div>
<div>
<p>this is another sample</p>
<span class="test">this sample should not caught</span>
<div>
this is another sample
</div>
</div>
</html>
now i want to search the word sample
from this string, here i should not get the "sample" which is inside the <span>...</span>
I want this to be done using regex, i tried a lot but i cant do it, any help is greatful.
Thanks in advance.
This is quite brittle and fails if there can be nested span
tags. If you don't have those, try
(?s)sample(?!(?:(?!</?span).)*</span>)
This matches sample
only if the next following span
tag (if any) is not a closing tag.
Explanation:
(?s) # Switch on dot-matches-all mode
sample # Match "sample".
(?! # only if it's not followed by the following regex:
(?: # Match...
(?!</?span) # (unless we're at the start of a span tag)
. # any character
)* # any number of times.
</span> # Match a closing span tag.
) # End of lookahead
To match sample
only if it's neither within a span
nor a p
, you can use
(?s)sample(?!(?:(?!</?span).)*</span>)(?!(?:(?!</?p).)*</p>)
But all this depends entirely on tags being unnested (i. e., no two tags of the same kind may be nested) and correctly balanced (which often isn't given with p
tags).