Search code examples
pythonxmlperformanceexpat-parser

Controlling number of bytes read() at a time with Expat


I'm parsing some XML using Python's Expat (by calling parser = xml.parsers.expat.ParserCreate() and then setting the relevant callbacks to my methods).

It seems that when Expat calls read(nbytes) to return new data, nbytes is always 2,048. I have quite a lot of XML to process, and suspect that these small read()s are making the overall process rather slow. As a point of reference, I'm seeing throughput around 9 MB/s on an Intel Xeon X5550, 2.67 GHz running Windows 7.

I've tried setting parser.buffer_text = True and parser.buffer_size = 65536, but Expat is still calling the read() method with an argument of just 2,048.

Is it possible to increase this?


Solution

  • You're talking about the xmlparse.ParseFile method, right?

    Unfortunately, no, that value is hardcoded as BUF_SIZE = 2048 in pyexpat.c.