I am trying to scrape a web page. I want to get reviews. But the reviews are of three categories, some are positive, some are neutral and some are negative. I am using html parser and have accessed many tags. But for the class which can be in three categories, how can I get them:
<div class="review positive" title="" style="background-color: #00B551;">9.3</div>
<div class="review negative" title="" style="background-color: #FF0000;">4.8</div>
<div class="review neutral" title="" style="background-color: #FFFF00;">6</div>
I have a python container for each div containing each item:
# finds each product from the store page
containers = page_soup.findAll("div", {"class": "item-container"})`
for container in containers:
title = container.findAll(a).text #This gives me titles
##Similarly I need the reviews of each of them here
review = container.findAll("div", {"class": "review "}))#along with review there is positive, neutral and negative word also according to the type of review
using regex, you can get the classes that contain the substring "review"
import re
for container in containers:
title = container.findAll(a).text #This gives me titles
review = container.findAll("div", {"class": re.compile(r'review')})
See the difference:
html = '''<div class="review positive" title="" style="background-color: #00B551;">9.3</div>
<div class="review negative" title="" style="background-color: #FF0000;">4.8</div>
<div class="review neutral" title="" style="background-color: #FFFF00;">6</div>'''
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html, 'html.parser')
review = soup.find_all('div', {'class':'review '})
print ('No regex: ',review)
review = soup.findAll("div", {"class": re.compile(r'review')})
print ('Regex: ',review)
No regex: []
Regex: [<div class="review positive" style="background-color: #00B551;" title="">9.3</div>, <div class="review negative" style="background-color: #FF0000;" title="">4.8</div>, <div class="review neutral" style="background-color: #FFFF00;" title="">6</div>]