I'm parsing some XML using Python's Expat (by calling parser = xml.parsers.expat.ParserCreate()
and then setting the relevant callbacks to my methods).
It seems that when Expat calls read(nbytes)
to return new data, nbytes
is always 2,048. I have quite a lot of XML to process, and suspect that these small read()s are making the overall process rather slow. As a point of reference, I'm seeing throughput around 9 MB/s on an Intel Xeon X5550, 2.67 GHz running Windows 7.
I've tried setting parser.buffer_text = True
and parser.buffer_size = 65536
, but Expat is still calling the read()
method with an argument of just 2,048.
Is it possible to increase this?
You're talking about the xmlparse.ParseFile method, right?
Unfortunately, no, that value is hardcoded as BUF_SIZE = 2048 in pyexpat.c.