Search code examples
scrapyresponseweb-crawler

scrapy 0.16,response is having no attribute selector,xpath()


I have searched on google and seen questions on Stack overflow too, but nothing is working. I have gone through

response.body and response.headers are working well however response.selector and response.xpath() is giving error saying that no such attribute exists for the response object

I am not able to import Selector too,because there is no Selector in the scrapy directory hierarchy(don't know why)

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class DmozSpider(BaseSpider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
        for sel in response.xpath('//ul/li'):
            title = sel.xpath('a/text()').extract()
            link = sel.xpath('a/@href').extract()
            desc = sel.xpath('text()').extract()
            print title, link, desc

I am using SCRAPY 0.16 (working with Django Dynamic Scraper,so can't update because it is compatible only with this version only)


Solution

  • You are probably looking at the documentation for the latest version. There have been quite a few changes since 0.16. You should be looking at the documentation for 0.16 http://doc.scrapy.org/en/0.16

    Your example should look like this:

    from scrapy.spider import BaseSpider
    from scrapy.selector import HtmlXPathSelector
    
    class DmozSpider(BaseSpider):
        name = "dmoz"
        allowed_domains = ["dmoz.org"]
        start_urls = [
            "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
            "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
        ]
    
        def parse(self, response):
            hxs = HtmlXPathSelector(response)
            sites = hxs.select('//ul/li')
            for site in sites:
                title = site.select('a/text()').extract()
                link = site.select('a/@href').extract()
                desc = site.select('text()').extract()
                print title, link, desc
    

    As described in the tutorial http://doc.scrapy.org/en/0.16/intro/tutorial.html