Search code examples
web-scrapingscrapyno-data

When using scrapy shell, I get no data from response.xpath


I am trying to scrape a betting site. However, when I check for the retrieved data in scrapy shell, I receive nothing.

The xpath to what I need is: //*[@id="yui_3_5_0_1_1562259076537_31330"] and when I write in the shell this is what I get:


In [18]: response.xpath ( '//*[@id="yui_3_5_0_1_1562259076537_31330"]')
Out[18]: []

The output is [] but I expected to be something from which I could extract the href.

When I use the "inspect" tool from Chrome, while the site is still loading, this id is outlined in purple. Does this mean that the site is using JavaScipt? And if this is true, is this the reason why scrapy does not find the item and returns []?


Solution

  • i try scraping the site just using Scrapy and this is my result.

    This the items.py file

        import scrapy
    
        class LifeMatchsItem(scrapy.Item):
    
            Event = scrapy.Field() # Name of event
            Match = scrapy.Field() # Teams1 vs Team2
            Date = scrapy.Field()  # Date of Match
    
    

    This is my Spider code

    
        import scrapy
        from LifeMatchesProject.items import LifeMatchsItem
    
    
        class LifeMatchesSpider(scrapy.Spider):
            name = 'life_matches'
            start_urls = ['http://www.betfair.com/sport/home#sscpl=ro/']
    
            custom_settings = {'FEED_EXPORT_ENCODING': 'utf-8'}
    
            def parse(self, response):
                for event in response.xpath('//div[contains(@class,"events-title")]'):
                    for element in event.xpath('./following-sibling::ul[1]/li'):
                        item = LifeMatchsItem()
                        item['Event'] = event.xpath('./a/@title').get()
                        item['Match'] = element.xpath('.//div[contains(@class,"event-name-info")]/a/@data-event').get()
                        item['Date'] = element.xpath('normalize-space(.//div[contains(@class,"event-name-info")]/a//span[@class="date"]/text())').get()
                        yield item
    
    

    And this is the result

    file.json