Search code examples
pythonhtmlregexbeautifulsouphtml-parsing

Python - Beautifulsoup - parse multiple span elements


I am trying to extract title from 'span'.

Using the below code as an example, the output I am looking for is 6536 and 9319, which are part of 'title'. Seen below:

span aria-label="6536 users starred this repository" class="Counter js-social-count" data-plural-suffix="users starred this repository" data-singular-suffix="user starred this repository" data-turbo-replace="true" data-view-component="true" id="repo-stars-counter-star" title="6,536">6.5k</span

I'm having trouble parsing in the last line of code get_text(). I think we could potentially use regex to parse socialstars but I'm not sure how.

from bs4 import BeautifulSoup
import requests

websites = ['https://github.com/marketplace/actions/yq-portable-yaml-processor','https://github.com/marketplace/actions/TruffleHog-OSS']

for links in websites:
URL = requests.get(links)
detailsoup = BeautifulSoup(URL.content, "html.parser")

# Extract stars
socialstars = detailsoup.findAll('span', {'class': 'Counter js-social-count'})
socialstarsList = [socialstars.get_text() for socialstars in socialstars]

Solution

  • You put the urls into a list and iterated over the list of urls and as the each webpage stars contain the same id. So you have to select only a single that's enough.

    from bs4 import BeautifulSoup
    import requests
    
    websites = ['https://github.com/marketplace/actions/yq-portable-yaml-processor','https://github.com/marketplace/actions/TruffleHog-OSS']
    
    for links in websites:
        URL = requests.get(links)
        detailsoup = BeautifulSoup(URL.content, "html.parser")
    
        # Extract stars
        socialstars = detailsoup.select_one('#repo-stars-counter-star').get('title')
        print(socialstars)
    

    Output:

     6,536
     9,319