I need a REGEX that can find blocks of PHP code in a file. For example:
<? print '<?xml version="1.0" encoding="UTF-8"?>';?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<?php echo "stuff"; ?>
</head>
<html>
When parsed would by the REGEX would return:
array(
"<? print '<?xml version=\"1.0\" encoding="UTF-8"?>';?>",
"<? echo \"stuff\"; ?>"
);
You can assume the PHP is valid.
With token_get_all
you get a list of PHP language tokens of a given PHP code. Then you just need to iterate the list, look for the open tag tokens and for the corresponding close tags.
$blocks = array();
$opened = false;
foreach (token_get_all($code) as $token) {
if (!$opened) {
if (is_array($token) && ($token[0] === T_OPEN_TAG || $token[0] === T_OPEN_TAG_WITH_ECHO)) {
$opened = true;
$buffer = $token[1];
}
} else {
if (is_array($token)) {
$buffer .= $token[1];
if ($token[0] === T_CLOSE_TAG) {
$opened = false;
$blocks[] = $buffer;
}
} else {
$buffer .= $token;
}
}
}