Search code examples
phpregexpreg-matchpreg-match-all

Conditional lookahead for preg_match


I have the following code to extract javascript code:

preg_match_all('#<script(?:[^>]+)?>(.*?)</script>#is', $GLOBALS['content'], $matches, PREG_SET_ORDER)

It works excellent for this:

<script type="text/javascript">
<script type="application/javascript">
<script>

But how do I avoid matching?

<script type="application/ld+json">


Solution

  • Either as @Wiktor says (using a negative lookahead) or with a parser:

    <?php
    
    $data = <<<DATA
    <script type="text/javascript">some js code here</script>
    <script type="application/javascript">some other code here</script>
    <script>This looks naked, dude!</script>
    <script type="application/ld+json">THIS MUST NOT BE MATCHED</script>
    DATA;
    
    $dom = new DOMDocument();
    $dom->loadHTML($data);
    
    $xpath = new DOMXPath($dom);
    $scripts = $xpath->query("//script[not(@type='application/ld+json')]");
    foreach ($scripts as $script) {
        # code...
    }
    ?>