I have an XML file with over 2,500 <Item>
elements.
The example below shows the sample layout. I want to copy every line in between <Item name="1st">
and the <Item name="500th">
to a new file as is. Then continue to the next 500 from <Item name=501st">
onwards, and write that out to a new file. Result is 5 new files. Nothing to be skipped.
<Item name="1st"><ItemProperties>
<property>data</property><property>data</property>
</ItemProperties>
...
...
<Item name="500th"><ItemProperties>
<property>data</property><property>data</property>
</ItemProperties>
The below operation does it for the first 500, but I do not know how to keep going until the last closing tag.
xmllint --xpath "//Item[position()<=500]" FileName.XML > Output1.XML
See this link for an example:
Using python, first solution is to treat from line 0 to the last line, one line at a time:
nfh = None
with open('foo.xml') as fh:
num = 0
for index, line in enumerate(fh):
if not index % 500:
num += 1
if nfh:
nfh.close()
nfh = open('file_name{}.txt'.format(num), 'w')
nfh.write(line)
if nfh:
nfh.close()
Second, is using lxml to enumerate only specific tag in the XML file:
import lxml.etree as etree
xml_data = etree.parse('foo.xml')
nfh = None
num = 0
for index, tag in enumerate(xml_data.xpath('//Item')):
# Enumerate 500 tags
if not index % 500:
num += 1
if nfh:
nfh.close()
nfh = open('Output{}.XML'.format(num), 'wb')
nfh.write(etree.tostring(tag))
if nfh:
nfh.close()
This, assuming your XML is closer to this:
<root>
<Item name="1st"><ItemProperties>
<property>data</property><property>data</property>
</ItemProperties>
</Item>
<Item name="2nd"><ItemProperties>
<property>data</property><property>data</property>
</ItemProperties>
</Item>
....
<Item name="500th"><ItemProperties>
<property>data</property><property>data</property>
</ItemProperties>
</Item>
....
</root>