Search code examples
pythongoogle-app-enginelxmliterparse

GAE Python LXML - XMLSyntaxError Specification mandate value for attribute object


I am using Google App Engine on Python and am trying to fetch a GZipped XML file and parse it with LXML's iterparse. I used the example from lxml.de to create the following code:

import gzip, base64, StringIO
from lxml import etree
from google.appengine.ext import webapp
from google.appengine.api.urlfetch import fetch

class Catalog(webapp.RequestHandler):
user = xxx
password = yyy
catalog = fetch('url',
                    headers={"Authorization": 
                             "Basic %s" % base64.b64encode(user + ':' + password)}, deadline=600)
items = etree.iterparse(StringIO.StringIO(catalog), tag='product')

for _, element in items:
    print('%s -- %s' % (element.findtext('name'), element[1].text))
    element.clear()

When I run it it gives me the following error:

for _, element in coupons:
File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml\lxml.etree.c:98565)
File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml\lxml.etree.c:99086)
File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml\lxml.etree.c:74791)
XMLSyntaxError: Specification mandate value for attribute object, line 1, column 53

What does this error mean? I am guessing that the XML file is malformed, however I don't know where to look for the problem. Any help would be appreciated!


Solution

  • The problem was solved by handling the fetch/gzip part differently, activating asynchronous requests, and using webapp2. When using all of this it worked :) Here's the code:

    from google.appengine.api.urlfetch import fetch
    import gzip, webapp2, base64, StringIO, datetime
    from credentials import CJCredentials
    from lxml import etree
    
    class Catalog(webapp2.RequestHandler):
    def get(self):
        user = xxx
        password = yyy
        url = 'some_url'
    
        catalogResponse = fetch(url, headers={
            "Authorization": "Basic %s" % base64.b64encode(user + ':' + password)
        }, deadline=10000000)
    
        f = StringIO.StringIO(catalogResponse.content)
        c = gzip.GzipFile(fileobj=f)
        content = c.read()
    
        xml = StringIO.StringIO(content)
    
        tree = etree.iterparse(xml, tag='product')
    
        for event, element in tree:
           print element.name