I am trying to scrape a website using BeautifulSoup
. and I am having trouble getting the ratings from a review. They are stored in a table that has a span
tag with last class 'star fill'
.
seatcomfort = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
Value For Money = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
Inflight Entertainment = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
print (seatcomfort)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
print (Value For Money)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star fill">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
print (Inflight Entertainment)
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
I hope to get 1 for Value for money
, 2 for for value for money
, and 3 for inflight entertainment
Question needs some improvment (fromatting, initial HTML or url) so this should only point into direction.
Select your elements with class star fill
and get len()
of ResultSet
len(soup.select('.review-rating-stars span.star.fill'))
or extract the text of the last element:
soup.select('.review-rating-stars span.star.fill')[-1].text
To store structured data use a dict
:
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
from bs4 import BeautifulSoup
html = '''
<table class="review-ratings">
<tbody><tr>
<td class="review-rating-header food-beverages">Food & Beverages</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header inflight-entertainment">Inflight Entertainment</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header seat-comfort">Seat Comfort</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header staff-service">Staff Service</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr>
<tr>
<td class="review-rating-header value-for-money">Value for Money</td>
<td class="review-rating-stars stars">
<span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span> </td>
</tr></tbody></table>
'''
soup = BeautifulSoup(html)
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
{'Food & Beverages': 3,
'Inflight Entertainment': 3,
'Seat Comfort': 3,
'Staff Service': 3,
'Value for Money': 3}