Search code examples
xmlboostxml-parsingboost-propertytree

Boost XML parser RAM consumption


I decided to check memory usage of PropertyTree for XML parsing with this piece of code. The XML has something over 120M, but this program was consuming over 2G when I decided to kill it. Is this standard consumption of PropertyTree or there is something wrong?

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
#include <boost/foreach.hpp>
#include <iostream>

int main()
{
  using boost::property_tree::ptree;
  ptree pt;
  read_xml("c.xml",pt);
  return 0;
}

Solution

  • Running your exact snippet compiled with Gcc 4.8 on 64-bit linux, and using the 117MiB input xml here, I get peak memory usage of 2.1 GiB:

    enter image description here

    The whole thing executes in ~4-14s depending on optimization flags. Using tcmalloc we get 2.7s even.

    You can see that at least 50% of the memory is directly in the ptree containers. In your PHP question you (correcly) mentioned that reading it all into a single DOM is just not such a great idea.

    Even so, if you use a more appropriate/capable library, like PugiXML, the execution is over 10x as fast and the memory usage is roughly 1/6th:

    enter image description here

    Here's the code:

    #include <pugixml.hpp>
    #include <iostream>
    
    int main() {
        pugi::xml_document doc;
        doc.load_file("input.xml");
    }
    

    Imagine what happens if you optimize for memory usage by using a streaming API.