Search code examples
phpregexdomreplacehtml-parsing

Replace javascript document.write() call from a HTML document body with its HTML content


Is there an AND operator for PHP's regular expressions?

I'm trying to replace everything from document to ' AND ).

$output = preg_replace('/document.*\'/', '', $output);

I've tried to find some tutorial for regex, but I can't find anything good.

EDIT: Misunderstanding.

This is the code before replaced.

<p>document.write(unescape('
<embed src="XXXXXX" type="application/x-shockwave-flash" wmode="window" width="712" height="475"%.35" allowFullScreen="true" ></embed>
')));</p>

I want to make it look like this:

<p>
<embed src="XXXXXX" type="application/x-shockwave-flash" wmode="window" width="712" height="475"%.35" allowFullScreen="true" ></embed>
</p>

Replaced:

document.write(unescape('

and

')));

Solution

  • What you actually want is to replace two parts, and leave something in between over. To not make it match undesired parts, use explicit character classes:

    = preg_replace("/document[\w.(]+['](.*?)['][);]+/s", '$1', $output); 
    

    So it matches anything enclosed in (' and ') with varying amounts of the latter.