I have the following code to extract javascript code:
preg_match_all('#<script(?:[^>]+)?>(.*?)</script>#is', $GLOBALS['content'], $matches, PREG_SET_ORDER)
It works excellent for this:
<script type="text/javascript">
<script type="application/javascript">
<script>
But how do I avoid matching?
<script type="application/ld+json">
Either as @Wiktor says (using a negative lookahead) or with a parser:
<?php
$data = <<<DATA
<script type="text/javascript">some js code here</script>
<script type="application/javascript">some other code here</script>
<script>This looks naked, dude!</script>
<script type="application/ld+json">THIS MUST NOT BE MATCHED</script>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$scripts = $xpath->query("//script[not(@type='application/ld+json')]");
foreach ($scripts as $script) {
# code...
}
?>