Search code examples
pythonxmlprettify

XML Prettifying from file in Python


I have an xml file which looks like the example below.

Many texts contain space as the start character, or have \n (newline) at the beginning, or other crazy stuff. I'm working with xml.etree.ElementTree, and it is good to parse from this file.

But I want more! :) I tried to prettify this mess, but without success. Tried many tutorials, but it always ends without pretty XML.

<?xml version="1.0"?>
<import>
<article>
<name> Name with space
</name>
<source> Daily Telegraph
</source>
<number>72/2015
</number>
<page>10
</page>
<date>2015-03-26
</date>
<author> Tomas First
</author>
<description>Economy
</description>
<attachment>
</attachment>
<region>
</region>
<text>
 My text is here
</text>
</article>
<article>
<name> How to parse
</name>
<source> Internet article
</source>
<number>72/2015
</number>
<page>1
</page>
<date>2015-03-26
</date>
<author>Some author
</author>
<description> description
</description>
<attachment>
</attachment>
<region>
</region>
<text>
 My text here
</text>
</article>
</import>

When I tried another answers from SO it generates same file or more messy XML


Solution

  • bs4 can do it

    from bs4 import BeautifulSoup
    
    doc = BeautifulSoup(xmlstring, 'xml')
    
    print doc.prettify()