I decided to check memory usage of PropertyTree for XML parsing with this piece of code. The XML has something over 120M, but this program was consuming over 2G when I decided to kill it. Is this standard consumption of PropertyTree or there is something wrong?
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
#include <boost/foreach.hpp>
#include <iostream>
int main()
{
using boost::property_tree::ptree;
ptree pt;
read_xml("c.xml",pt);
return 0;
}
Running your exact snippet compiled with Gcc 4.8 on 64-bit linux, and using the 117MiB input xml here, I get peak memory usage of 2.1 GiB:
The whole thing executes in ~4-14s depending on optimization flags. Using tcmalloc we get 2.7s even.
You can see that at least 50% of the memory is directly in the ptree
containers. In your PHP question you (correcly) mentioned that reading it all into a single DOM is just not such a great idea.
Even so, if you use a more appropriate/capable library, like PugiXML, the execution is over 10x as fast and the memory usage is roughly 1/6th:
Here's the code:
#include <pugixml.hpp>
#include <iostream>
int main() {
pugi::xml_document doc;
doc.load_file("input.xml");
}
Imagine what happens if you optimize for memory usage by using a streaming API.