Search code examples
pythonscrapyweb-crawlerscrapinghub

Scrapinghub Getting Error caught on signal handler: <bound method ? on Yield


I have a scrapy script that works locally, but when I deploy it to Scrapinghub, it's giving all errors. Upon debugging, the error is coming from Yielding the item.

This is the error I get.

ERROR   [scrapy.utils.signal] Error caught on signal handler: <bound method ?.item_scraped of <sh_scrapy.extension.HubstorageExtension object at 0x7fd39e6141d0>> Less
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/usr/local/lib/python2.7/site-packages/sh_scrapy/extension.py", line 45, in item_scraped
    item = self.exporter.export_item(item)
  File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 304, in export_item
    result = dict(self._get_serialized_fields(item))
  File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 75, in _get_serialized_fields
    value = self.serialize_field(field, field_name, item[field_name])
  File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 284, in serialize_field
    return serializer(value)
  File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 290, in _serialize_value
    return dict(self._serialize_dict(value))
  File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 300, in _serialize_dict
    key = to_bytes(key) if self.binary else key
  File "/usr/local/lib/python2.7/site-packages/scrapy/utils/python.py", line 117, in to_bytes
    'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got int

It doesn't specify the field with issues, but by process of elimination, I came to realize it's this part of the code:

  try:
            item["media"] = {}
            media_index = 0

            media_content = response.xpath("//audio/source/@src").extract_first()
            if media_content is not None:
                item["media"][media_index] = {}
                preview = item["media"][media_index]
                preview["Media URL"] = media_content
                preview["Media Type"] = "Audio"
                media_index += 1
        except IndexError:
            print "Index error for media " + item["asset_url"]

I cleared some parts up to make it easier to tackle, but basically this part is the issue. Something it doesn't like about the item media.

I'm beginner in both Python and Scrapy. So sorry if this turns out to be silly basic Python mistake. Any idea?

EDIT: So after getting the answer from ThunderMind, the solution was to simply do str(media_index) for key


Solution

  • Yeah, right here:

    item["media"][media_index] = {}
    

    media_index is a mutable. and Keys can't be mutable. Read Python dict, to know what should be used as keys.