Search code examples
jsonxpathscrapyhandle

handle error json.loads on scrapy response.xpath


I try to get the data using scrapy from a website using following command:

jsondata = response.xpath('//script[@type="application/ld+json"]/text()').extract_first()
microdata = json.loads(jsondata)   
author = microdata["author"]["name"]
editor = microdata["editor"]["name"]
daten = microdata["datePublished"]

but it give me an error if the json part "//script[@type="application/ld+json"]/text()" not found on the website.

Thanks for any help


Solution

  • import scrapy
    import json
    
    class RefSpider(scrapy.Spider):
        name = "refspider"
    
        start_urls = ['https://www.antaranews.com/berita/2320530/gempa-di-padang-lawas-utara-dipicu-oleh-aktivitas-sesar-sumatera',
                      'https://www.antaranews.com/foto/2320526/penjualan-pernak-pernik-hiasan-kemerdekaan']
    
        def parse(self, response):
            jsondata = response.xpath('//script[@type="application/ld+json"]/text()').extract_first()
    
            if jsondata is not None:
                microdata = json.loads(jsondata)
                author = microdata["author"]["name"]
                editor = microdata["editor"]["name"]
                daten = microdata["datePublished"]