Parse XML with namespace attribute changing in Python

I am making a request to a URL and in the xml response I get, the xmlns attribute namespace changes from time to time. Hence finding an element returns None when I hardcode the namespace.

For instance I get the following XML:

<package xmlns="http://schemas.microsoft.com/packaging/2012/06/nuspec.xsd">
<metadata>
<id>SharpZipLib</id>
<version>1.1.0</version>
<authors>ICSharpCode</authors>
<owners>ICSharpCode</owners>
<requireLicenseAcceptance>false</requireLicenseAcceptance>
<licenseUrl>https://github.com/icsharpcode/SharpZipLib/blob/master/LICENSE.txt</licenseUrl>
<projectUrl>https://github.com/icsharpcode/SharpZipLib</projectUrl>
<description>SharpZipLib (#ziplib, formerly NZipLib) is a compression library for Zip, GZip, BZip2, and Tar written entirely in C# for .NET. It is implemented as an assembly (installable in the GAC), and thus can easily be incorporated into other projects (in any .NET language)</description>
<releaseNotes>Please see https://github.com/icsharpcode/SharpZipLib/wiki/Release-1.1 for more information.</releaseNotes>
<copyright>Copyright © 2000-2018 SharpZipLib Contributors</copyright>
<tags>Compression Library Zip GZip BZip2 LZW Tar</tags>
<repository type="git" url="https://github.com/icsharpcode/SharpZipLib" commit="45347c34a0752f188ae742e9e295a22de6b2c2ed"/>
<dependencies>
<group targetFramework=".NETFramework4.5"/>
<group targetFramework=".NETStandard2.0"/>
</dependencies>
</metadata>
</package>

Now see the xmlns attribute. The entire attribute is same but sometimes the '2012/06' part keeps changing from time to time for certain responses. I have the following python script. See the line ns = {'nuspec': 'http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd'}. I can't hardcode the namespace like that. Are there any alternatives like using regular expressions etc to map the namespace? Only the date part changes i.e. 2013/05 in some responses its 2012/04 etc.

def fetch_nuget_spec(self, versioned_package):
        name = versioned_package.package.name.lower()
        version = versioned_package.version.lower()
        url = f'https://api.nuget.org/v3-flatcontainer/{name}/{version}/{name}.nuspec'
        response = requests.get(url)
        metadata = ET.fromstring(response.content)
        ns = {'nuspec': 'http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd'}
        license = metadata.find('./nuspec:metadata/nuspec:license', ns)
        if license is None:
            license_url=metadata.find('./nuspec:metadata/nuspec:licenseUrl', ns)
            if license_url is None:
                return { 'license': 'Not Found'  }
            return {'license':license_url.text}
        else:
            if len(license.text)==0:
                print('SHIT')
            return { 'license': license.text  }

Solution

Without another modul, all with xml.etree.ElementTree:

import xml.etree.ElementTree as ET

tree = ET.parse('xml_str.xml')
root = tree.getroot()

ns = dict([node for _, node in ET.iterparse('xml_str.xml', events=['start-ns'])])
print(ns)

licenseUrl = root.find(".//licenseUrl", ns).text
print("LicenseUrl: ", licenseUrl)

Output:

{'': 'http://schemas.microsoft.com/packaging/2012/06/nuspec.xsd'}
LicenseUrl:  https://github.com/icsharpcode/SharpZipLib/blob/master/LICENSE.txt

Option 2, if parsing time is important:


import xml.etree.ElementTree as ET

nsmap = {}
for event, node in ET.iterparse('xml_str.xml', events=['start-ns', 'end']):
    
    if event == 'start-ns':
        ns, url = node
        nsmap[ns] = url
        print(nsmap)

    if event == 'end' and node.tag == f"{{{url}}}licenseUrl":
        print(node.text)

Output:


{'': 'http://schemas.microsoft.com/packaging/2012/06/nuspec.xsd'}
https://github.com/icsharpcode/SharpZipLib/blob/master/LICENSE.txt