I have an xml file which looks like the example below.
Many texts contain space as the start character, or have \n
(newline) at the beginning, or other crazy stuff. I'm working with xml.etree.ElementTree
, and it is good to parse from this file.
But I want more! :) I tried to prettify this mess, but without success. Tried many tutorials, but it always ends without pretty XML.
<?xml version="1.0"?>
<import>
<article>
<name> Name with space
</name>
<source> Daily Telegraph
</source>
<number>72/2015
</number>
<page>10
</page>
<date>2015-03-26
</date>
<author> Tomas First
</author>
<description>Economy
</description>
<attachment>
</attachment>
<region>
</region>
<text>
My text is here
</text>
</article>
<article>
<name> How to parse
</name>
<source> Internet article
</source>
<number>72/2015
</number>
<page>1
</page>
<date>2015-03-26
</date>
<author>Some author
</author>
<description> description
</description>
<attachment>
</attachment>
<region>
</region>
<text>
My text here
</text>
</article>
</import>
When I tried another answers from SO it generates same file or more messy XML
bs4
can do it
from bs4 import BeautifulSoup
doc = BeautifulSoup(xmlstring, 'xml')
print doc.prettify()