Search code examples
pythonxmlminidompythonanywhere

XML parsing error with Python Script with pythonanywhere (but not on local machine)


I'm running a flask app with python, part of which uses XML data retrieved from a third-party API. I use minidom to parse the XML within the python script.

Relevant python code:

from xml.dom import minidom
import requests

usa_xml = requests.get(URL_HERE)
usa_parsed = minidom.parseString(usa_xml.content)

The script goes on to locate and display values from the XML. Running the python script on my local machine, everything works as it should. Having put a repository up on pythonanywhere, the parsing fails with the same XML data.

Error traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.4/dist-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/jshorty/OwlWire/owlwire.py", line 65, in select
    usa_parsed = minidom.parseString(usa_xml.content)
  File "/usr/lib/python3.4/xml/dom/minidom.py", line 1970, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib/python3.4/xml/dom/expatbuilder.py", line 925, in parseString
    return builder.parseString(string)
  File "/usr/lib/python3.4/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
xml.parsers.expat.ExpatError: no element found: line 29, column 7    

Since it's failing with the same XML that would work otherwise, it doesn't seem like this is an issue with the XML itself. I'm stumped at where to start looking for the problem- I'm using all the default modules pre-installed on pythonanywhere, could this be an issue of different versions of minidom?

The error is always at line 29, column 7, so here is a link to one instance of the XML I'm accessing: http://ebird.org/ws1.1/data/obs/region_spp/recent?rtype=country&r=US&sci=surnia%20ulula&back=30&maxResults=1&includeProvisional=true


Solution

  • My guess is that you're using a free account. Free accounts on PythonAnywhere have restricted Internet access, you can only access sites that are on the whitelist:

    https://www.pythonanywhere.com/wiki/403ForbiddenError

    https://www.pythonanywhere.com/whitelist/

    You'll see that if you adjust your code to do a:

    usa_xml = requests.get(URL_HERE)
    print(usa_xml)
    

    You'll probably see a

    <Response [403]> 
    

    403 being forbidden.

    We (the PythonAnywhere team) are usually happy to add sites with a public API to the whitelist. ebird.com looks fine, I'll see if I can get that added. For anyone else with a similar request, don't hesitate to get in touch with us if you see a 403!