Search code examples
phpregexparsingphp-parser

Find PHP with REGEX


I need a REGEX that can find blocks of PHP code in a file. For example:

    <? print '<?xml version="1.0" encoding="UTF-8"?>';?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <?php echo "stuff"; ?>
    </head>
    <html>

When parsed would by the REGEX would return:

array(
    "<? print '<?xml version=\"1.0\" encoding="UTF-8"?>';?>",
    "<? echo \"stuff\"; ?>"
);

You can assume the PHP is valid.


Solution

  • With token_get_all you get a list of PHP language tokens of a given PHP code. Then you just need to iterate the list, look for the open tag tokens and for the corresponding close tags.

    $blocks = array();
    $opened = false;
    foreach (token_get_all($code) as $token) {
        if (!$opened) {
            if (is_array($token) && ($token[0] === T_OPEN_TAG || $token[0] === T_OPEN_TAG_WITH_ECHO)) {
                $opened = true;
                $buffer = $token[1];
            }
        } else {
            if (is_array($token)) {
                $buffer .= $token[1];
                if ($token[0] === T_CLOSE_TAG) {
                    $opened = false;
                    $blocks[] = $buffer;
                }
            } else {
                $buffer .= $token;
            }
        }
    }