Search code examples
pythonweb-scrapingwebbeautifulsoupjupyter-notebook

How to get the text from the last span with class 'star fill'?


I am trying to scrape a website using BeautifulSoup. and I am having trouble getting the ratings from a review. They are stored in a table that has a span tag with last class 'star fill'.

seatcomfort = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')

Value For Money = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')

Inflight Entertainment = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')

print (seatcomfort)

<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>

print (Value For Money)

<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star fill">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>

print (Inflight Entertainment)

<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span></td>
<td class="review-rating-stars stars"><span class="star fill">1</span><span class="star">2</span><span class="star">3</span><span class="star">4</span><span class="star">5</span></td>

I hope to get 1 for Value for money , 2 for for value for money, and 3 for inflight entertainment


Solution

  • Question needs some improvment (fromatting, initial HTML or url) so this should only point into direction.

    Select your elements with class star fill and get len() of ResultSet

    len(soup.select('.review-rating-stars span.star.fill'))
    

    or extract the text of the last element:

    soup.select('.review-rating-stars span.star.fill')[-1].text
    

    To store structured data use a dict:

    {e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
    

    Example

    from bs4 import BeautifulSoup
    html = '''
    <table class="review-ratings">
    <tbody><tr>
        <td class="review-rating-header food-beverages">Food &amp; Beverages</td>
        <td class="review-rating-stars stars">
            <span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
    </tr>
                                                <tr>
        <td class="review-rating-header inflight-entertainment">Inflight Entertainment</td>
        <td class="review-rating-stars stars">
            <span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
    </tr>
                                                <tr>
        <td class="review-rating-header seat-comfort">Seat Comfort</td>
        <td class="review-rating-stars stars">
            <span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
    </tr>
                                                <tr>
        <td class="review-rating-header staff-service">Staff Service</td>
        <td class="review-rating-stars stars">
            <span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
    </tr>
                                                <tr>
        <td class="review-rating-header value-for-money">Value for Money</td>
        <td class="review-rating-stars stars">
            <span class="star fill">1</span><span class="star fill">2</span><span class="star fill">3</span><span class="star">4</span><span class="star">5</span>                                              </td>
    </tr></tbody></table>
    '''
    soup = BeautifulSoup(html)
    
    {e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
    

    Output

    {'Food & Beverages': 3,
     'Inflight Entertainment': 3,
     'Seat Comfort': 3,
     'Staff Service': 3,
     'Value for Money': 3}