Search code examples
pythonxmlnamespaceslxml

Remove namespace and prefix from xml in python using lxml


I have an xml file I need to open and make some changes to, one of those changes is to remove the namespace and prefix and then save to another file. Here is the xml:

<?xml version='1.0' encoding='UTF-8'?>
<package xmlns="http://apple.com/itunes/importer">
  <provider>some data</provider>
  <language>en-GB</language>
</package>

I can make the other changes I need, but can't find out how to remove the namespace and prefix. This is the reusklt xml I need:

<?xml version='1.0' encoding='UTF-8'?>
<package>
  <provider>some data</provider>
  <language>en-GB</language>
</package>

And here is my script which will open and parse the xml and save it:

metadata = '/Users/user1/Desktop/Python/metadata.xml'
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
open(metadata)
tree = etree.parse(metadata, parser)
root = tree.getroot()
tree.write('/Users/user1/Desktop/Python/done.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8')

So how would I add code in my script which will remove the namespace and prefix?


Solution

  • Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.

    from lxml import etree, objectify
    
    metadata = '/Users/user1/Desktop/Python/metadata.xml'
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(metadata, parser)
    root = tree.getroot()
    
    ####    
    for elem in root.getiterator():
        if not hasattr(elem.tag, 'find'): continue  # guard for Comment tags
        i = elem.tag.find('}')
        if i >= 0:
            elem.tag = elem.tag[i+1:]
    objectify.deannotate(root, cleanup_namespaces=True)
    ####
    
    tree.write('/Users/user1/Desktop/Python/done.xml',
               pretty_print=True, xml_declaration=True, encoding='UTF-8')
    

    Note: Some tags like Comment return a function when accessing tag attribute. added a guard for that.