Search code examples
python-3.xweb-scrapingbeautifulsouphtml-parsing

How to extract some specific strings from a list and store them in variables in beautifulsoup?


I would like to extract specific strings on a list of multiple items containing multiple tags (and strings). And store them into variables.

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.khanacademy.org/profile/DFletcher1990/')
r.html.render(sleep=5)

soup=BeautifulSoup(r.html.html,'html.parser')

user_socio_table=soup.find_all('div', class_='discussion-stat')
print(user_socio_table)

Here is the supposed output of print(user_socio_table):

[<div class="discussion-stat">
            4<span class="discussion-light"> questions</span>
</div>, <div class="discussion-stat">
            444<span class="discussion-light"> votes</span>
</div>, <div class="discussion-stat">
            718<span class="discussion-light"> answers</span>
</div>, <div class="discussion-stat">
            15<span class="discussion-light"> flags raised</span>
</div>, <div class="discussion-stat">
            10<span class="discussion-light"> project help requests</span>
</div>, <div class="discussion-stat">
            38<span class="discussion-light"> project help replies</span>
</div>, <div class="discussion-stat">
            208<span class="discussion-light"> comments</span>
</div>, <div class="discussion-stat">
            11<span class="discussion-light"> tips and thanks</span>
</div>]
  • I would like to store 4 into a variable called questions,
  • I would like to store 444 into a variable called votes,
  • I would like to store 718 into a variable called answers,
  • I would like to store 15 into a variable called flags,
  • I would like to store 10 into a variable called help_requests,
  • I would like to store 38 into a variable called help_replies,
  • I would like to store 208 into a variable called comments,
  • I would like to store 11 into a variable called tips_thanks.

Thanks for your help !


Solution

  • You can get values by one by and add it in json array

    data = {}
    for gettext in user_socio_table:
       category = gettext.find('span')
       category_text = category.text.strip()  ## get text in span
       number = category.previousSibling.strip() ## get value before span tag
       data[category_text] = number ## add it
    
    
    print(data)
    

    OUTPUT :

    {'questions': '4', 'votes': '444', 'answers': '718', 'flags raised': '15', 'project help requests': '10', 'project help replies': '38', 'comments': '208', 'tips and thanks': '11'}
    

    You can get value with spesific one

    print(data['questions'])
    

    OUTPUT :

    4