Search code examples
phpperformancersssimplexmldomdocument

SimpleXML vs DOMDocument performance


I am building an RSS parser using the SimpleXML Class and I was wondering if using the DOMDocument class would improve the speed of the parser. I am parsing an rss document that is at least 1000 lines and I use almost all of the data from those 1000 lines. I am looking for the method that will take the least time to complete.


Solution

  • SimpleXML and DOMDocument both use the same parser (libxml2), so the parsing difference between them is negligible.

    This is easy to verify:

    function time_load_dd($xml, $reps) {
        // discard first run to prime caches
        for ($i=0; $i < 5; ++$i) { 
            $dom = new DOMDocument();
            $dom->loadXML($xml);
        }
        $start = microtime(true);
        for ($i=0; $i < $reps; ++$i) { 
            $dom = new DOMDocument();
            $dom->loadXML($xml);
        }
        $stop = microtime(true) - $start;
        return $stop;
    }
    function time_load_sxe($xml, $reps) {
        for ($i=0; $i < 5; ++$i) { 
            $sxe = simplexml_load_string($xml);
        }
        $start = microtime(true);
        for ($i=0; $i < $reps; ++$i) { 
            $sxe = simplexml_load_string($xml);
        }
        $stop = microtime(true) - $start;
        return $stop;
    }
    
    
    function main() {
        // This is a 1800-line atom feed of some complexity.
        $url = 'http://feeds.feedburner.com/reason/AllArticles';
        $xml = file_get_contents($url);
        $reps = 10000;
        $methods = array('time_load_dd','time_load_sxe');
        echo "Time to complete $reps reps:\n";
        foreach ($methods as $method) {
            echo $method,": ",$method($xml,$reps), "\n";
        }
    }
    main();
    

    On my machine I get basically no difference:

    Time to complete 10000 reps:
    time_load_dd: 17.725028991699
    time_load_sxe: 17.416455984116
    

    The real issue here is what algorithms you are using and what you are doing with the data. 1000 lines is not a big XML document. Your slowdown will not be in memory usage or parsing speed but in your application logic.