Search code examples
phpregexpreg-split

regex - preg_split getting inline script tags


I'm trying to separate individual inline script tags:

<script>
    console.log('hello');
    console.log('hi!');
    console.log('yo!');
</script>
<script type="text/javascript">
    console.log("this is another inline script");
    var hi = "cool";
    console.log(hi);
</script>

Here's the pattern that I'm using:

$scripts = preg_split('#(<script>.*?</script>|<script type="text/javascript>.*?</script>")#', $str);    

But I'm getting this result:

Array
(
    [0] =>     <script>
        console.log('hello');
        console.log('hi!');
        console.log('yo!');
    </script>
    <script type="text/javascript">
        console.log("this is another inline script");
        var hi = "cool";
        console.log(hi);
    </script>
)

While I'm expecting to get something like this:

Array
(
    [0] =>     <script>
        console.log('hello');
        console.log('hi!');
        console.log('yo!');
    </script>
    [1] =>
    <script type="text/javascript">
        console.log("this is another inline script");
        var hi = "cool";
        console.log(hi);
    </script>
)

What's wrong with the pattern that I'm using? Thanks in advance!

Update

If I use the s modifier I get something like this:

Array
(
    [0] => 
    [1] => 
<script type="text/javascript">
            console.log("this is another inline script");
            var hi = "cool";
            console.log(hi);
</script>
)

It manages to separate the 2 scripts but the first script becomes an empty string


Solution

  • I'm just gonna make a list:

    • . does not match newlines unless PCRE_DOTALL (the /s flag) is used.

    • For preg_split you also need the PREG_SPLIT_DELIM_CAPTURE option to not just get rid of the matched parts.

    • In your case you better should use preg_match_all instead of preg_split.

    And lastly, in anticipation of your next question, your expression did not match your source:

    ...>|<script type="text/javascript>.*?<....
                                      ^
    

    In conclusion, better use something like:

    preg_match_all("~( <script[^>]*>  (.*?)  </script> )~smix", $src, ...