Search code examples
phphtmlregexhtml-parsingconditional-comments

Remove almost all HTML comments using Regex


Using this regex expression:

preg_replace( '/<!--(?!<!)[^\[>].*?-->/', '', $output )

I'm able to remove all HTML comments from my page except for anything that looks like this:

<!--[if IE 6]>
    Special instructions for IE 6 here
<![endif]-->

How can I modify this to also exclude HTML comments which include a unique phrase, such as "batcache"?

So, an HTML comment this:

<!--
generated 37 seconds ago
generated in 0.978 seconds
served from batcache in 0.004 seconds
expires in 263 seconds
-->

Won't be removed.


This code seems to do the trick:

preg_replace( '/<!--([\s\S]*?)-->/', function( $c ) { return ( strpos( $c[1], '<![' ) !== false || strpos( $c[1], 'batcache' ) !== false ) ? $c[0] : ''; }, $output )

Solution

  • This should replace alle the comments which doesn't contain "batcache". The matching is done between this two tags: <!-- to --> .

    $result = preg_replace("/<!--((?!batcache)(?!\\[endif\\])[\\s\\S])*?-->/", "", $str);
    

    You can test it here.

    As already stated by other users it's not always safe to parse HTML with regex but if you have a relative assurance of what kind of HTML you will parse it should work as expected. If the regex doesn't match some particular usecase let me know.