I've been trying to create this Regex but it's driving me crazy. What I basically need is a regex that detects all <body>
tags which are not written through my server-side code. My plan is to replace all those <body>
tags by something like <body><%=CallToFunction()%>
.
(This is part of a search & replace within UltraEdit
.)
E.g:
<body> //should be found
<body class="normal"> //should be found
<body class="<% Response.Write("normal") %>" //should be found
<html><body class="normal"> //should be found
Response.Write("<body class=""normal"">") //should not be found (a)
Response.Write(" <body>") //should not be found (b)
Response.Write("<html><body><h1>...") //should not be found (c)
message = "<html><body>...</body></html>" //should not be found (d)
Response.Write(message)
Response.Write("<html>
<head></head>
<body class=""normal"">
<h1>...</h1>") //should not be found (e)
The regex I have currently is: ([^"]<body.*[^>]*>)
. But the problem there is that it will still find <body>
tags with a space between the <body>
and the "
(see example (a)
). It would also still find (c)
.
And for (e)
I am really clueless. Wondering if that is even possible to detect.
Can anyone help me?
Thanks!
EDIT
I now have ^(?!Response|")(<body.*[^>]*>)
which works pretty well. But it doesn't work when the <body>
tag is indented in the document. So I'd need something like <body
prepended by anything (or nothing) other than Response
or "
.
ANSWER
The regex I eventually ended up using was based on Michael Allen's answer and was:
^(?!Response|")([\t ]*)(<body.*[^>]*>)
It did not solve (e)
but I guess I'll do some manual work for those cases then.
Not really sure how you would deal with the last example but the following regex will match correctly with every other example you provided.
^(?!Response).*<body.*>
The trick here is using the Negative lookahead to knock out any matches that contains Response
at the beginning.
Hopefully thats a start for you.