Search code examples
python-3.xerror-handlingpython-requestspython-requests-html

How to ignore HTTP errors while using requests with for loop?


this is my code that checks multiple urls for a specific keyword and writes to the output file if the keyword was found or not.

import requests
import pandas as pd
from bs4 import BeautifulSoup

df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []

for url in urls:
    url_1 = url
    keyword ='myKeyword'
    res = requests.get(url_1)
    finalresult= print(keyword in res.text)

    if finalresult == False:
        myList.append("NOT OK")
    else:
        myList.append("OK")

df["myList"] = pd.DataFrame(myList, columns=['myList'])

df.to_csv('/path/to/output.csv', index=False)

However, once any of my multiple URLs is down and there is an HTTP error the script stops and the following error is displayed:

    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='argos-yoga.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x122582d90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

How can I ignore such errors and let my script continue with the scan? Could someone help me with this? thx


Solution

  • Try to put try..except only around requests.get() and res.text.

    For example:

    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    df = pd.read_csv('/path/to/input.csv')
    urls = df.T.values.tolist()[2]
    myList= []
    
    for url in urls:
        url_1 = url
        keyword ='myKeyword'
        try:                                    # <-- put try..except here
            res = requests.get(url_1)
            finalresult = keyword in res.text   # <-- remove print()
        except:
            finalresult = False
    
        if finalresult == False:
            myList.append("NOT OK")
        else:
            myList.append("OK")
    
    df["myList"] = pd.DataFrame(myList, columns=['myList'])
    
    df.to_csv('/path/to/output.csv', index=False)
    

    EDIT: To put Down into the list when there's error:

    for url in urls:
        url_1 = url
        keyword ='myKeyword'
        try:                                    # <-- put try..except here
            res = requests.get(url_1)
    
            if keyword in res.text:
                myList.append("OK")
            else:
                myList.append("NOT OK")
        except:
            myList.append("Down")