I am trying to convert an approx. 100MB XML file into another XML file by putting all elements of a certain tag in the new file. Since conventional writing resulted into memory problems, I wanted to do so using Mako templates. There are about 60000 elements in the XML, and to keep memory usage low, I tried to pass a generator to the template. However, this resulted in a segfault. My knowledge of memory management is very low, but it seems to have something to do with putting the content in a template, as when I 'just print' the elements no problems arise. Am I abusing the template rendering for something it isn't for? How to solve this?
My rendering code:
from lxml import etree
from mako.template import Template
from mako.runtime import Context
ns = {'xmlns': 'http://namesp.ace/version/1'}
## get xml elements with correct tag
featgen = etree.iterparse('somefile.xml', tag='{%s}sometag' % ns['xmlns'], events=('start',))
templatefn = 'template.mako'
# create template
template = Template(filename=templatefn)
with open('outfile', 'w') as fp:
ctx = Context(fp, tag=featgen)
template.render_context(ctx)
And the template:
<%! from lxml import etree
def tostr_xml(el, ns):
strxml = etree.tostring(el)
el.clear()
strxml = strxml.replace('xmlns="{0}" '.format(ns['xmlns']), '')
strxml = strxml.replace('xmlns:p="{0}" '.format(ns['xmlns:p']), '')
strxml = strxml.replace('xmlns:xsi="{0}" '.format(ns['xmlns:xsi']), '')
return strxml
%>
<?xml version='1.0' encoding='ASCII'?>
<root>
<features>
% for ev,el in tag:
${tostr_xml,el, {'xmlns':'http://namesp.ace/version/1'})}
% endfor
</features>
</root>
I solved the problem by turning:
featgen = etree.iterparse('somefile.xml', tag='{%s}sometag' % ns['xmlns'], events=('start',))
into:
featgen = etree.iterparse('somefile.xml', tag='{%s}sometag' % ns['xmlns'])
I can however not say why this works. If anyone'd like to explain I'll accept that answer instead.