Search code examples
pythonscrapyscrapy-shell

Get empty array when class containing spaces


Python 2.7

I want to get each of the new's background image url and titles, but I use xpath always get empty array when I try to get image url.

Here is what I try:

scrapy shell http://www.wownews.tw/fashion/movie

and then

response.body

I can see the html data on terminal. But when I type

response.xpath('//div[@class="text ng-scope"]')

get empty array, I thought it should be work.

Is the problem happen because class containing spaces ?

How to fix it ? Any help would be appreciated.

I try the command still get empty array

response.xpath('//div[contains(concat(" ", normalize-space(@class), " "), "text ng-scope")]')

Solution

  • Here is everything what you need

    import json
    import scrapy
    
    
    class ListingSpider(scrapy.Spider):
        name = 'listing'
    
        start_urls = ['http://api.wownews.tw/f/pages/site/558fd617913b0c11001d003d?category=5590a6a3f0a8bf110060914d&children=true&limit=48&page=1']
    
        def parse(self, response):
            items = json.loads(response.body)['results']
    
            for item in items:
                yield item
    

    Refer to https://medium.com/@yashpokar/scrape-any-website-in-the-internet-without-using-splash-or-selenium-68a6c9733369