this is my code that checks multiple urls for a specific keyword and writes to the output file if the keyword was found or not.
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
res = requests.get(url_1)
finalresult= print(keyword in res.text)
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
However, once any of my multiple URLs is down and there is an HTTP error the script stops and the following error is displayed:
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='argos-yoga.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x122582d90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
How can I ignore such errors and let my script continue with the scan? Could someone help me with this? thx
Try to put try..except
only around requests.get()
and res.text
.
For example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
finalresult = keyword in res.text # <-- remove print()
except:
finalresult = False
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
EDIT: To put Down
into the list when there's error:
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
if keyword in res.text:
myList.append("OK")
else:
myList.append("NOT OK")
except:
myList.append("Down")