This is a followup to a question I had this morning - Using Beatiful Soup to get data from non-class section
I am trying to take a set of information and search it against a set of keywords. If the keyword is found in the set of information, I want to bail. However, my code is not finding the keyword, even though a keyword is in the set of information
negative_keywords = ['basement', 'unfinished', 'hardwood'] #defined at beginning of script
bodyContents = soup.find(attrs={'id' : 'postingbody'})
for validate in negative_keywords:
if (string.find(str(bodyContents.string).lower(),validate) != -1):
keyword_found = TRUE
continue
Here is the sample data
<section id="postingbody">
3BR/2BA newly renovated ranch
<p>
<b></b>
</p>
<hr></hr>
<h2>
Some address
</h2>
<h2>
$950.00 / Month
</h2>
<h3 style="color:maroon;">
- Description:
</h3>
<blockquote>
3BR/2BA newly renovated ranch. Near all that Towne…
</blockquote>
<h3 style="color:maroon;">
- Details:
</h3>
<ul>
<li></li>
<li></li>
<li>
<b></b>
No
</li>
<li></li>
<li></li>
<li></li>
<li></li>
This is how I would do it
import BeautifulSoup
negative_keywords = ['basement', 'unfinished', 'hardwood']
html = '''
<section id="postingbody">
Looking for a corporate rental, this beautiful decorated 5 BR,
4.5 BA two story house is in a desirable location, 7 minutes off
I 85. Beautiful solid cherry cabinets in kitchen and laundry room.
All stainless steel appliances. Hardwood floors in kitchen and foyer,
Ceramic tile floors in all bathrooms, laundry room, dining room and sunroom.
<br>
</br>
</section>
'''
soup = BeautifulSoup.BeautifulSoup(html)
bodyContents = soup.find(attrs={'id' : 'postingbody'})
if any([k in bodyContents.getText().lower() for k in negative_keywords]):
print "keyword was found"