Search code examples
pythonbeautifulsoupget

How do I efficiently check if data was returned in my GET request?


I am webscraping and need to parse through a few thousand GET requests at a time. Sometimes these requests fail and I get 429 and/or 403 errors so I need to check if there is data before parsing the response. I wrote this function:

def check_response(response):
    if not response or not response.content:
        return False
    else:
        soup = BeautifulSoup(response.content, "html.parser")
        if not soup or not soup.find_all(attrs={"class": "stuff"}):
            return False
    
    return True

This works, but it can take quite a while to loop through a few thousand responses. Is there a better way?


Solution

  • You can use the response.status_code attribute to check the status code of the response. You can find a full list of HTTP error codes on MDN, but if it is >= 400, then it's definitely an error. Try using this code:

    def check_response(response):
        if not response or not response.content or response.status_code >= 400:
            return False
        else:
            soup = BeautifulSoup(response.content, "html.parser")
            if not soup or not soup.find_all(attrs={"class": "stuff"}):
                return False
        return True
    

    Note that you need to indent your return True one level inwards, or else it will never be called because of the else-statement.