I'm writing a code to scrape selected portions of visible text off a great number of web pages. Here's a part of it:
divTag = soup.find_all("div", {'id':'articleBody'})
for tag in divTag:
pTags = tag.find_all("p")
for tag in pTags:
print >>f, tag.text
How can I check if Python has found and written the targeted text, and put the link aside (to a list) if the scraping wasn't a success?
I didn't find an answer here, and I don't know where to look in the documentation.
This is an alternative to know if python found the text you are looking for:
import requests
from bs4 import BeautifulSoup
urls = ['https://www.google.com']
for i in range(len(urls)):
r = requests.get(urls[i])
soup = BeautifulSoup(r.content, 'lxml')
items = soup.find_all('p')
for item in items:
if "2016 - Privacidad - Condiciones" in item.text:
print "Python has found the targeted text"
If python doesn't find the text
, you need to use remove()
method.