Search code examples

preg_replace() pattern starting with < and ending with > doesn't find HTML comments as intended

I'm trying to run a script here. I did put some content into a variable $x. $x is full of html code. Now I want to replace / remove all html comments and write it to a file.

I have this regex: <!--([\s\S]*?)-->. and it works fine in editors or but in my php it doesn't. Maybe you can help me out.

//$x = content
$summary2 = preg_replace("<!--([\s\S]*?)-->", "", $x);
fwrite($fh, $summary2);

Edit: This is some example of the content i want to get rid off.

	Evaluation<!--[if gte mso 9]><xml>
<o:AllowPNG />
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:PunctuationKerning />
<w:ValidateAgainstSchemas />
<w:BreakWrappedTables />
<w:SnapToGridInCell />
<w:WrapTextWithPunct />
<w:UseAsianBreakRules />
<w:DontGrowAutofit />
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="156">
</xml><![endif]--><!--[if gte mso 10]>
/* Style Definitions */
{mso-style-name:"Normale Tabelle";
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
font-family:"Times New Roman";


  • What are Regular Expressions?

    A sequence of symbols and characters expressing a string or pattern to be searched for within a longer piece of text.

    What are delimiters?

    When using the PCRE functions, it is required that the pattern is enclosed by delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character.

    Which pair of characters can be used as delimiters?

    Often used delimiters are forward slashes (/), hash signs (#) and tildes (~).

    It is also possible to use bracket style delimiters where the opening and closing brackets are the starting and ending delimiter, respectively. (), {}, [] and <> are all valid bracket style delimiter pairs.

    What about my case <!--([\s\S]*?)-->?

    So your RegEx, incidentally, has delimiters inside which is starting < and ending > characters and correspondingly your RegEx pattern would be !--([\s\S]*?)-- which may not be what you want.

    What should I do?

    Wrap it within a pair of delimiters. E.g. /<!--([\s\S]*?)-->/

    Does it work?

    Check it live

    Is it a good practice?

    No, it is not! Never (but to not lie about it I do it sometimes!)! Regular Expressions are not made to modify HTML/XML elements. You should go with DOMDocument class for this specific purpose which will make your life much more easier and cleaner:

    $dom = new DOMDocument();
    $xpath = new DOMXPath($dom);
    foreach ($xpath->query('//comment()') as $comment) {
    echo $dom->saveHTML();

    Check it live