Search code examples
pythonxmlbeautifulsoup

Beautifulsoup append ignores namespace (xml)


Hi I want to add single pictures to an existing pictures tag. I am doing it like this

for i, el in enumerate(data.get("pictures").get('picture')):
    img_link = el.get("link")[0].get('href')
    test = BeautifulSoup("""
            <pic:picture>
        <pic:link rel="thumbnail"
            href="xxxx" />
    </pic:picture>
    
    """, "xml")
    soup.ad.pictures.append(test)
print(soup.ad.pictures)

The result looks like this:

<pic:pictures>
<picture>
<link href="xxxx" rel="thumbnail"/>
</picture><picture>
<link href="xxxx" rel="thumbnail"/>
</picture><picture>
<link href="xxxx" rel="thumbnail"/>
</picture><picture>
<link href="xxxx" rel="thumbnail"/>
</picture></pic:pictures>

Why are the namespaces gone? I tried before to use new_tag and there are namespaces in there. Adding pic:pictures works fine with new_tag method but I was not able to add pic:link to pic:pictures.


Solution

  • The problem is how you create the new tag.

    When you create a new tag using BeautifulSoup() you will lose all XML namespaces. Consider this example:

    from bs4 import BeautifulSoup
    
    main_xml = '''\
    <tag xmlns:ns1="http://namespace1/" xmlns:ns2="http://namespace2/">
        <ns1:child>I'm in namespace 1</ns1:child>
        <ns2:child>I'm in namespace 2</ns2:child>
    </tag>'''
    
    
    soup = BeautifulSoup(main_xml, 'xml')
    
    # add tag created by BeautifulSoup() constructor
    test = BeautifulSoup("""<ns1:child>New Tag</ns1:child>""", "xml")
    soup.tag.append(test)
    
    print(soup)
    

    Prints:

    <?xml version="1.0" encoding="utf-8"?>
    <tag xmlns:ns1="http://namespace1/" xmlns:ns2="http://namespace2/">
    <ns1:child>I'm in namespace 1</ns1:child>
    <ns2:child>I'm in namespace 2</ns2:child>
    <child>New Tag</child>
    </tag>
    

    You see the newly appended tag doesn't have any namespace


    However, if you create new tag using soup.new_tag() method the namespaces are not lost (because soup object already knows about them):

    # add tag created by .new_tag() method
    test = soup.new_tag('ns1:child')
    test.string = "I'm new tag!"
    soup.tag.append(test)
    
    print(soup)
    

    Prints:

    <?xml version="1.0" encoding="utf-8"?>
    <tag xmlns:ns1="http://namespace1/" xmlns:ns2="http://namespace2/">
    <ns1:child>I'm in namespace 1</ns1:child>
    <ns2:child>I'm in namespace 2</ns2:child>
    <ns1:child>I'm new tag!</ns1:child>
    </tag>