Search code examples
regextagsserver-sidedetection

Regex to find all <body> tags not written by server-side code


I've been trying to create this Regex but it's driving me crazy. What I basically need is a regex that detects all <body> tags which are not written through my server-side code. My plan is to replace all those <body> tags by something like <body><%=CallToFunction()%>.

(This is part of a search & replace within UltraEdit.)

E.g:

<body>                                          //should be found
<body class="normal">                           //should be found
<body class="<% Response.Write("normal") %>"    //should be found
<html><body class="normal">                     //should be found

Response.Write("<body class=""normal"">")       //should not be found (a)
Response.Write(" <body>")                       //should not be found (b)
Response.Write("<html><body><h1>...")           //should not be found (c)

message = "<html><body>...</body></html>"       //should not be found (d)
Response.Write(message)

Response.Write("<html>
                <head></head>
                <body class=""normal"">
                    <h1>...</h1>")              //should not be found (e)

The regex I have currently is: ([^"]<body.*[^>]*>). But the problem there is that it will still find <body> tags with a space between the <body> and the " (see example (a)). It would also still find (c).

And for (e) I am really clueless. Wondering if that is even possible to detect.

Can anyone help me?

Thanks!

EDIT

I now have ^(?!Response|")(<body.*[^>]*>) which works pretty well. But it doesn't work when the <body> tag is indented in the document. So I'd need something like <body prepended by anything (or nothing) other than Response or ".

ANSWER

The regex I eventually ended up using was based on Michael Allen's answer and was:

^(?!Response|")([\t ]*)(<body.*[^>]*>)

It did not solve (e) but I guess I'll do some manual work for those cases then.


Solution

  • Not really sure how you would deal with the last example but the following regex will match correctly with every other example you provided.

    ^(?!Response).*<body.*>
    

    The trick here is using the Negative lookahead to knock out any matches that contains Response at the beginning.

    Hopefully thats a start for you.