Search code examples
phphtmlregexevaloutput-buffering

How can I break out segments of a 'PHP' file into raw PHP, -and- possible raw HTML -in order-


So I've got a concept of how to do this - but actually implementing me is a bit of a stumper for myself; mostly due to my lack of regex experience - but let's get into it.

I'd like to 'parse' through a 'php' file that could contain something like the following:

<?php
function Something()
{
}
?>

<html>
<body>
<? Something(); ?>
</body>
</html>

<?php
// Some more code or something
?>

If interpreted exactly - the above is worthless jibberish - but it is a good example of what I'd like to be able to parse, or interpret...

The idea is that I would read the contents of the above file, and break it out into an ordered array of its respective pieces; while tracking what 'type' each 'segment' is, so that I can either simply echo it, or run an 'eval()' on it.

Effectively, I'd like to end up with an array something like this:

$FileSegments = array();

$FileSegments[0]['type'] = "PHP";
$FileSegments[0]['content'] = "
    function Something()
    {
    }";

$FileSegments[1]['type'] = "HTML";
$FileSegments[1]['content'] = "
    <html>
    <body>";

$FileSegments[2]['type'] = "PHP";
$FileSegments[2]['content'] = "Something();"

And so on...

The initial idea was to simply 'include()' or 'require()' the file in question, and grab its output from the output buffer - but it dawned on me that I would like to be able to inject some 'top level' variables into each one of these files before evaluating the code. To do this, I would have to 'eval()' my injected code, with the contents of the file after said injection - but in order to do this with the ability to handle raw HTML in the file too, I would have to basically write a temporary clone of the whole file, that just had my injected code written before the actual contents... Cumbersome, and slow.

I hope you're all following here... If not I can clarify...

The only other piece I feel I should note before finalizing this question; is that I would like to retain any variables or symbols in general ( for instance the 'Something() function ) created in segments 0 and 2, for instance, and pass them down to segment '4'... I feel like this might be achievable using the extract method, and then manually writing in those pieces of data before my next segment executes - but again I'm shooting a little in the dark on that.

If anyone has a better approach, or can give me some brief code on just extracting these 'segments' out of a file, I would be ecstatic.

cheers

ETA: It dawns on me that I can probably pose this question a little more simply: If there isn't a 'simple' way to do the above, is there a way to handle a String in the exact same way that 'require()' and 'include()' handle a File?


Solution

  • <?php    
    $str = file_get_contents('filename.php');
    
        // get values from starting characters
        $php_full = array_filter(explode('<?php', $str));
        $php = array_filter(explode('<?', $str));
        $html = array_filter(explode('?>', $str));
    
    
        // remove values after last expected characters
        foreach ($php_full as $key => $value) {
            $php_full_result[] = substr($value, 0, strpos($value, '?>'));
        }
    
        foreach ($php as $key => $value) {
            if( strpos($value,'php') !== 0 )
            {
                $php_result[] = substr($value, 0, strpos($value, '?>'));
            }
        }
    
        $html_result[] = substr($str, 0, strpos($str, '<?'));
    
        foreach ($html as $key => $value) {
            $html_result[] = substr($value, 0, strpos($value, '<?'));
        }
    
        $html_result = array_filter($html_result);
    
        echo '<pre>';
        print_r($php_full_result);
        echo '</pre>';
    
        echo '<pre>';
        print_r($php_result);
        echo '</pre>';
    
        echo '<pre>';
        var_dump($html_result);
        echo '</pre>';
    
    ?>
    

    This will give you 3 arrays of file segments you want, not the exact format you wanted but you can easily modify this arrays to your needs.

    For "I'd like to break all of my '$GLOBALS' variables out into their 'simple' names" part you can use extract like

    extract($GLOBALS);