Search code examples
pythonweb-scrapingxmlhttprequest

How to solve ConnectionError (RemoteDisconnected) in Python?


I am trying to scrape https://gmatclub.com/forum/decision-tracker.html and I am able to get majority of things that I want but sometimes I am stuck with ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).

How do I solve it?

My code is:

import requests

link = 'https://gmatclub.com/api/schools/v1/forum/app-tracker-latest-updates'
params = {
    'limit': 500,
    'offset': 0,
    'year': 'all'
}

with requests.Session() as con:
    con.headers["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.86 YaBrowser/21.3.0.740 Yowser/2.5 Safari/537.36"
    con.get("https://gmatclub.com/forum/decision-tracker.html")
    while True:
        endpoint = con.get(link,params=params).json()
        if not endpoint["statistics"]:break
        for item in endpoint["statistics"]:
            print(item['school_title'])

        params['offset']+=499

Solution

  • One strategy could be repeat the request until you get correct response from the server, for example:

    import requests
    from time import sleep
    
    link = "https://gmatclub.com/api/schools/v1/forum/app-tracker-latest-updates"
    params = {"limit": 500, "offset": 0, "year": "all"}
    
    with requests.Session() as con:
        con.headers[
            "User-Agent"
        ] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.86 YaBrowser/21.3.0.740 Yowser/2.5 Safari/537.36"
        con.get("https://gmatclub.com/forum/decision-tracker.html")
        while True:
    
            # repeat until we got correct response from server:
            while True:
                try:
                    endpoint = con.get(link, params=params).json()
                    break
                except requests.exceptions.ConnectionError:
                    sleep(3)  # wait a little bit and try again
                    continue
    
            if not endpoint["statistics"]:
                break
            for item in endpoint["statistics"]:
                print(item["school_title"])
    
            params["offset"] += 499