I'm trying to run a script here.
I did put some content into a variable $x
.
$x
is full of html code.
Now I want to replace / remove all html comments and write it to a file.
I have this regex: <!--([\s\S]*?)-->
.
and it works fine in editors or www.phpliveregex.com.
but in my php it doesn't.
Maybe you can help me out.
//$x = content
$summary2 = preg_replace("<!--([\s\S]*?)-->", "", $x);
fwrite($fh, $summary2);
Edit: This is some example of the content i want to get rid off.
</ul>
<p>
Evaluation<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG />
<o:TargetScreenSize>1024x768</o:TargetScreenSize>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:HyphenationZone>21</w:HyphenationZone>
<w:PunctuationKerning />
<w:ValidateAgainstSchemas />
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables />
<w:SnapToGridInCell />
<w:WrapTextWithPunct />
<w:UseAsianBreakRules />
<w:DontGrowAutofit />
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="156">
</w:LatentStyles>
</xml><![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Normale Tabelle";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin:0cm;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
</style>
<![endif]--></p>
<ul>
<li>
A sequence of symbols and characters expressing a string or pattern to be searched for within a longer piece of text.
When using the PCRE functions, it is required that the pattern is enclosed by delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character.
Often used delimiters are forward slashes (/), hash signs (#) and tildes (~).
It is also possible to use bracket style delimiters where the opening and closing brackets are the starting and ending delimiter, respectively. (), {}, [] and <> are all valid bracket style delimiter pairs.
<!--([\s\S]*?)-->
?So your RegEx, incidentally, has delimiters inside which is starting <
and ending >
characters and correspondingly your RegEx pattern would be !--([\s\S]*?)--
which may not be what you want.
Wrap it within a pair of delimiters. E.g. /<!--([\s\S]*?)-->/
No, it is not! Never (but to not lie about it I do it sometimes!)! Regular Expressions are not made to modify HTML/XML elements. You should go with DOMDocument
class for this specific purpose which will make your life much more easier and cleaner:
$dom = new DOMDocument();
$dom->loadHtml($str, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//comment()') as $comment) {
$comment->parentNode->removeChild($comment);
}
echo $dom->saveHTML();