Parse huge XML - remember last successfully processed node to set offset on next run

I have some pretty big xml files which used for scheduled import. I use cron to parse them. The problem is that processing takes too much time and always exceeds the php "max_execution_time". Since I use XMLReader, that allows to read xml "line by line", the only one solution I see: track current working "node", memorize it and set node offset on next cron run.

Now I have

  $xml = new XMLReader;
  $xml->open($file);
  $pointer = 0;

  while($xml->read()) {

    if ($xml->nodeType == XMLReader::ELEMENT && $xml->localName == 'Product') {
      $chunk = array();
      $chunk['ProductID'] = $xml->getAttribute('ProductID');
      $chunk['ProductName'] = $xml->getAttribute('ProductName');
      process_import($chunk); // Process received date
      save_current_node_in_BD($pointer++); // insert current position in BD
    }
  }
  $xml->close();
}

Is it good idea to use $pointer++ to count processed nodes? How to set an offset for next cron run?

Solution

First of all, when you execute php from the cron, you normally use the cli version which has a default max_execution_time of 0 (disabled). If you can't change that, continue reading.

If your XML can be parsed within time (parsing only, no processing) you can have two crons:

The first cron will parse the XML and dump new tasks onto a pile.
The second cron will take work from the pile, process it and then remove it from the pile.

The pile can be implemented in a few ways, amongst which:

A database table
A directory of work items (each work item is one file)

Edit

If you can't disable the execution time limit you can keep a small file comprising the file name and position. At each iteration you can open this file to determine if there's still work to be done. To make sure it saves that file when the time ran out, you need to register a shutdown function.