I have some pretty big xml files which used for scheduled import. I use cron to parse them. The problem is that processing takes too much time and always exceeds the php "max_execution_time". Since I use XMLReader, that allows to read xml "line by line", the only one solution I see: track current working "node", memorize it and set node offset on next cron run.
Now I have
$xml = new XMLReader;
$xml->open($file);
$pointer = 0;
while($xml->read()) {
if ($xml->nodeType == XMLReader::ELEMENT && $xml->localName == 'Product') {
$chunk = array();
$chunk['ProductID'] = $xml->getAttribute('ProductID');
$chunk['ProductName'] = $xml->getAttribute('ProductName');
process_import($chunk); // Process received date
save_current_node_in_BD($pointer++); // insert current position in BD
}
}
$xml->close();
}
Is it good idea to use $pointer++ to count processed nodes? How to set an offset for next cron run?
First of all, when you execute php from the cron, you normally use the cli version which has a default max_execution_time of 0 (disabled). If you can't change that, continue reading.
If your XML can be parsed within time (parsing only, no processing) you can have two crons:
The pile can be implemented in a few ways, amongst which:
Edit
If you can't disable the execution time limit you can keep a small file comprising the file name and position. At each iteration you can open this file to determine if there's still work to be done. To make sure it saves that file when the time ran out, you need to register a shutdown function.