Search code examples

How to Skip Content from a tag <span class=""> </span> while regex search?

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I have a string which is html like this

      <p>this is sample content</p>
      <p>this is another sample</p>
      <span class="test">this sample should not caught</span>
       this is another sample

now i want to search the word sample from this string, here i should not get the "sample" which is inside the <span>...</span>

I want this to be done using regex, i tried a lot but i cant do it, any help is greatful.

Thanks in advance.


  • This is quite brittle and fails if there can be nested span tags. If you don't have those, try


    This matches sample only if the next following span tag (if any) is not a closing tag.


    (?s)          # Switch on dot-matches-all mode
    sample        # Match "sample".
    (?!           # only if it's not followed by the following regex:
     (?:          #  Match...
      (?!</?span) #   (unless we're at the start of a span tag)
      .           #   any character
     )*           #  any number of times.
     </span>      #  Match a closing span tag.
    )             # End of lookahead

    To match sample only if it's neither within a span nor a p, you can use


    But all this depends entirely on tags being unnested (i. e., no two tags of the same kind may be nested) and correctly balanced (which often isn't given with p tags).