Search code examples
phpdomdocumentfilesize

Prevent loading from remote source if file is larger than a given size


Let's say I want XML Files only with upto 10MB to be loaded from a remote server.

Something like

$xml_file = "http://example.com/largeXML.xml";// size= 500MB

//PRACTICAL EXAMPLE: $xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";// size= 683MB

 /*GOAL: Do anything that can be done to hinder this large file from being loaded by the DOMDocument without having to load the File n check*/

$dom =  new DOMDocument();

$dom->load($xml_file /*LOAD only IF the file_size is <= 10MB....else...echo 'File is too large'*/);

How can this possibly be achieved?.... Any idea or alternative? or best approach to achieving this would be highly appreciated.

I checked PHP: Remote file size without downloading file but when I try with something like

var_dump(
    curl_get_file_size(
        "http://www.dailymotion.com/rss/user/dialhainaut/"
    )
);

I get string 'unknown' (length=7)

When I try with get_headers as suggested below, the Content-Length is missing in the headers, so this will not work reliably either.

Please kindly advise how to determine the length and avoid sending it to the DOMDocument if it exceeds 10MB


Solution

  • Ok, finally working. The headers solution was obviously not going to work broadly. In this solution, we open a file handle and read the XML line by line until it hits the threshold of $max_B. If the file is too big, we still have the overhead of reading it up until the 10MB mark, but it's working as expected. If the file is less than $max_B, it proceeds...

    $xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/";
    //$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";
    
    $fh = fopen($xml_file, "r");  
    
    if($fh){
        $file_string = '';
        $total_B = 0;
        $max_B = 10485760;
        //run through lines of the file, concatenating them into a string
        while (!feof($fh)){
            if($line = fgets($fh)){
                $total_B += strlen($line);
                if($total_B < $max_B){
                    $file_string .= $line;
                } else {
                    break;
                }
            }
        } 
    
        if($total_B < $max_B){
            echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...';
            //proceed
            $dom = new DOMDocument();
            $dom->loadXML($file_string); //NOTE the method change because we're loading from a string   
    
        } else {
            //reject
            echo 'File too big! Max size = '.$max_B.' bytes.';  
        }
    
        fclose($fh);
    
    } else {
        echo '404 file not found!';
    }