I know that using regular expressions on HTML is not preferred, but I am still confused as to why this doesn't work:
I'm trying to remove the "head" from a document.
Here's the doc:
<html>
<head>
<!--
a comment within the head
-->
</head>
<body>
stuff in the body
</body>
</html>
My code:
$matches = array(); $result = preg_match ('/(?:<head[^>]*>)(.*?)(<\/head>)/is', $contents, $matches);
var_dump ($matches);
This does not actually work. Here's the output I see:
array(3) { [0]=> string(60) " " [1]=> string(47) " " [2]=> string(7) "" }
However, if I adjust the HTML doc to not have the comment
Your script is working fine, it's not displaying correctly due to the HTML in the dump (you can tell by the lengths in your var_dump
output). Try:
$result = preg_match ('/(?:<head[^>]*>)(.*?)(<\/head>)/is', $contents, $matches);
ob_start(); // Capture the result of var_dump
var_dump ($matches);
echo htmlentities(ob_get_clean()); // Escape HTML in the dump
Also, as has been said, you need to use preg_replace
to replace the match with ''
in order to actually remove the head.