How to re-scrape a page if there is an error in parse method?

The first action in my parse method is to extract a dictionary from a JSON string contained in the HTML. I've noticed that I sometimes get an error as the web page doesn't display correctly and thus doesn't contain the JSON string. If I rerun the spider then the same page displays fine and on it carries on until another random JSON error.

I'd like to check that I've got the error handling correct:

def parse(self, response):
    json_str = response.xpath("<xpath_to_json>").get()
    try:
        items = json.loads(json_str)["items"]
    except JSONDecodeError:
        return response.follow(url=response.url, callback=self.parse)
    for i in items:
        # do stuff

I'm pretty sure this will work ok but wanted to check check a couple of things:

If this hits a 'genuinely bad' page where there is no JSON will the spider get stuck in a loop or does scrapy give up after trying a given URL a certain number of times?
I've used a return instead of a yield because I don't want to continue running the method. Is this ok?

Any other comments are welcome too!!

Solution

I think return when getting decoding error in your case should be ok as the scraper is not iterating through the scraped results. I think normally response.follow and Request would filter out duplicated requests so you would need to include dont_filter=True when calling them to allow duplicated url requests. To configure a n number of retry, it's not the cleanest approach but you could keep a dictionary to keep track of retry attempt counts for certain url as self property (self.retry_count in below code), increase it every time the url request is parsed and stop when a limit number is hit.

import json
from json import JSONDecodeError
import scrapy


class TestSpider(scrapy.Spider):
    name = "test"

    def start_requests(self):
        urls = [
            "https://quotes.toscrape.com/page/1/",
            "https://quotes.toscrape.com/page/2/"
        ]
        for url in urls:
            self.retry_count = {k:0 for k in urls}
            self.retry_limit = 3
            yield scrapy.Request(url=url, callback=self.parse, dont_filter=True)

    def parse(self, response):
        self.retry_count[response.url] += 1
        json_str = "{\"items\": 1" # intentionally trigger json decode error
        print(f'===== RUN {response.url}; Attempt: {self.retry_count} =====')
        try:
            items = json.loads(json_str)["items"]
        except JSONDecodeError as ex:
            print("==== ERROR ====")
            if self.retry_count[response.url] == self.retry_limit:
                raise ex
            else:
                return response.follow(url=response.url, callback=self.parse, dont_filter=True)
        
        self.retry_count[response.url] = 0 # reset attempt as parse successful