I am making a request to a URL and in the xml response I get, the xmlns attribute namespace changes from time to time. Hence finding an element returns None when I hardcode the namespace.
For instance I get the following XML:
<package xmlns="http://schemas.microsoft.com/packaging/2012/06/nuspec.xsd">
<metadata>
<id>SharpZipLib</id>
<version>1.1.0</version>
<authors>ICSharpCode</authors>
<owners>ICSharpCode</owners>
<requireLicenseAcceptance>false</requireLicenseAcceptance>
<licenseUrl>https://github.com/icsharpcode/SharpZipLib/blob/master/LICENSE.txt</licenseUrl>
<projectUrl>https://github.com/icsharpcode/SharpZipLib</projectUrl>
<description>SharpZipLib (#ziplib, formerly NZipLib) is a compression library for Zip, GZip, BZip2, and Tar written entirely in C# for .NET. It is implemented as an assembly (installable in the GAC), and thus can easily be incorporated into other projects (in any .NET language)</description>
<releaseNotes>Please see https://github.com/icsharpcode/SharpZipLib/wiki/Release-1.1 for more information.</releaseNotes>
<copyright>Copyright © 2000-2018 SharpZipLib Contributors</copyright>
<tags>Compression Library Zip GZip BZip2 LZW Tar</tags>
<repository type="git" url="https://github.com/icsharpcode/SharpZipLib" commit="45347c34a0752f188ae742e9e295a22de6b2c2ed"/>
<dependencies>
<group targetFramework=".NETFramework4.5"/>
<group targetFramework=".NETStandard2.0"/>
</dependencies>
</metadata>
</package>
Now see the xmlns attribute. The entire attribute is same but sometimes the '2012/06' part keeps changing from time to time for certain responses. I have the following python script. See the line ns = {'nuspec': 'http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd'}
. I can't hardcode the namespace like that. Are there any alternatives like using regular expressions etc to map the namespace? Only the date part changes i.e. 2013/05 in some responses its 2012/04 etc.
def fetch_nuget_spec(self, versioned_package):
name = versioned_package.package.name.lower()
version = versioned_package.version.lower()
url = f'https://api.nuget.org/v3-flatcontainer/{name}/{version}/{name}.nuspec'
response = requests.get(url)
metadata = ET.fromstring(response.content)
ns = {'nuspec': 'http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd'}
license = metadata.find('./nuspec:metadata/nuspec:license', ns)
if license is None:
license_url=metadata.find('./nuspec:metadata/nuspec:licenseUrl', ns)
if license_url is None:
return { 'license': 'Not Found' }
return {'license':license_url.text}
else:
if len(license.text)==0:
print('SHIT')
return { 'license': license.text }
Without another modul, all with xml.etree.ElementTree
:
import xml.etree.ElementTree as ET
tree = ET.parse('xml_str.xml')
root = tree.getroot()
ns = dict([node for _, node in ET.iterparse('xml_str.xml', events=['start-ns'])])
print(ns)
licenseUrl = root.find(".//licenseUrl", ns).text
print("LicenseUrl: ", licenseUrl)
Output:
{'': 'http://schemas.microsoft.com/packaging/2012/06/nuspec.xsd'}
LicenseUrl: https://github.com/icsharpcode/SharpZipLib/blob/master/LICENSE.txt
Option 2, if parsing time is important:
import xml.etree.ElementTree as ET
nsmap = {}
for event, node in ET.iterparse('xml_str.xml', events=['start-ns', 'end']):
if event == 'start-ns':
ns, url = node
nsmap[ns] = url
print(nsmap)
if event == 'end' and node.tag == f"{{{url}}}licenseUrl":
print(node.text)
Output:
{'': 'http://schemas.microsoft.com/packaging/2012/06/nuspec.xsd'}
https://github.com/icsharpcode/SharpZipLib/blob/master/LICENSE.txt