Search code examples
phpregexiframestrip

php strip iframes and scripts tags (no htmlentities())


I'm processing a xml file and sometimes there are "iframes" and "script" tags i need to get out , before i even 'xml-parse-it'

I'm trying some regular expressions but i'm getting it wrong ! :(

test string:

      $teststring = 'p><iframe src="http://www.facebook.com/plugins/like.php?href=abcdef&layout=standard&show_faces=false&width=450&action=like&colorscheme=dark&height=35" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:35px;" allowtransparency="true"></iframe></p>';

 //todo clean this up// found this function on net. //more legacy stufff
    $Rules = array(
         '@<script[^>]*?>.*?</script>@si', // Strip out javascript                
        '@&(cent|#162);@i', //   Cent 
        '@&(pound|#163);@i', //   Pound
        '@&(copy|#169);@i', //   Copyright
        '@&(reg|#174);@i', //   Registered
        '@&#(d+);@e', // Evaluate as php
---> PROBLEM--> '@&lt;iframe [^&lt;]&lt;.*?&lt;\/iframe&gt;@i',

    );

    $Replace = array(
         '',
        chr( 162 ),
        chr( 163 ),
        chr( 169 ),
        chr( 174 ),
        'chr()',
        '',

    );
        //expecting <p></p>
    $data = preg_replace( $Rules, $Replace, $teststring);


            echo $data;

Solution

  • Just Try this

    '@&lt;iframe(?:(?!&gt;).)*&gt;.*?&lt;\/iframe&gt;@i'