I working on a test case that visits a page, gets the page source and saves it into an html file. Before saving the source code, I need to strip out all javascript from "" to "". I've gone through numerous online resources and come up with <script type="text/javascript">([\\s\\S]*?)<\\/script>
but the regular expression syntax I enter into the test case does not seem to work. Does anyone have any suggestions?
More Info:
The page source code contains many instances of JavaScript and spans multiple lines so I believe I need to prefix the expression with (ims)
. In my solution above, you'll also see that I've escaped the backslashes since I read somewhere that it was necessary.
Example of the source code:
<html>
<script type="text/javascript">
some multiline javascript
</script>
<script type="text/javascript"> some single line javascript </script>
<body>
body content
</body>
<script type="text/javascript">
some more javascript
</script>
Here is my try:
"<script[^>]*>[^\0]*?<\/script>", gi
Explaining:
# <script # match the start of the tag
# [^>]*> # match anything till the ">" character
# [^\0]*?<\/script> # match anything (not null) till the closing tag