Search code examples
pythonsoapwsdllxml

With Python 3 and lxml, how to extract the Version number from a SOAP WSDL?


When I test with a subset of the WSDL file, with Name Spaces omitted from the file and code, it works fine.

#  for reference, these are the final lines from the WSDL
#
#   <wsdl:service name="Shopping">
#           <wsdl:documentation>
#               <Version>1027</Version>
#           </wsdl:documentation>
#       <wsdl:port binding="ns:ShoppingBinding" name="Shopping">
#           <wsdlsoap:address location="http://open.api.ebay.com/shopping"/>
#       </wsdl:port>
#   </wsdl:service>
#</wsdl:definitions>

from lxml import etree
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
print(wsdl.findtext('wsdl:.//Version'))   # wish this would print 1027

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 "/Users/matecsaj/Google Drive/Projects/collectibles/eBay/figure-it3.py"
    Traceback (most recent call last):
      File "src/lxml/_elementpath.py", line 79, in lxml._elementpath.xpath_tokenizer (src/lxml/_elementpath.c:2414)
    KeyError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/matecsaj/Google Drive/Projects/collectibles/eBay/figure-it3.py", line 14, in <module>
print(wsdl.findtext('wsdl:.//Version'))   # wish this would print 1027
File "src/lxml/etree.pyx", line 2230, in lxml.etree._ElementTree.findtext (src/lxml/etree.c:69049)
File "src/lxml/etree.pyx", line 1552, in lxml.etree._Element.findtext (src/lxml/etree.c:60629)
File "src/lxml/_elementpath.py", line 329, in lxml._elementpath.findtext (src/lxml/_elementpath.c:10089)
File "src/lxml/_elementpath.py", line 311, in lxml._elementpath.find (src/lxml/_elementpath.c:9610)
File "src/lxml/_elementpath.py", line 300, in lxml._elementpath.iterfind (src/lxml/_elementpath.c:9282)
File "src/lxml/_elementpath.py", line 277, in lxml._elementpath._build_path_iterator (src/lxml/_elementpath.c:8675)
File "src/lxml/_elementpath.py", line 82, in xpath_tokenizer (src/lxml/_elementpath.c:2542)
SyntaxError: prefix 'wsdl' not found in prefix map

Process finished with exit code 1

Solution

  • With credit to the kind folks that commented, here is a modified solution that does print the Version number. All I could get working was the wildcard search. Also, the iterator skipped the Version element, so I had to get at it from its parent element. Good enough.

    from lxml import etree
    wsdlLink = "http://schemas.xmlsoap.org/wsdl/"
    wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
    for element in wsdl.iter('{'+wsdlLink+'}*'):
        if 'documentation' in element.tag:
            for child in element:
                print(child.text)