Search code examples
python-3.xweb-scrapingbeautifulsouphtml-parsing

How to take specific elements on multiple lists on beautifulsoup?


I have trouble to extract some specific tags (and their string content) and store them into variables (so I can get these variables into a CSV file later).

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.khanacademy.org/profile/DFletcher1990/')
r.html.render(sleep=5)
soup=BeautifulSoup(r.html.html,'html.parser')

user_info_table=soup.find('table', class_='user-statistics-table')

for tr in user_info_table.find_all('tr'):
    tds=tr.find_all('td')
    print(tds)

I would like to collect :

  • "4 years ago" and store it into a variable called date,
  • "932,915" and store it into a variable called points,
  • "372" and store it into a variable called videos.

I don't really understand how bs4.element.ResultSet behave...


Solution

  • You can just treat it like a list.

    from bs4 import BeautifulSoup
    from requests_html import HTMLSession
    session = HTMLSession()
    r = session.get('https://www.khanacademy.org/profile/DFletcher1990/')
    r.html.render(sleep=10)
    soup=BeautifulSoup(r.html.html,'html.parser')
    user_info_table=soup.find('table', class_='user-statistics-table')
    dates,points,videos=[tr.find_all('td')[1].text for tr in user_info_table.find_all('tr')]
    print(dates,points,videos,sep="\n")
    

    Output

    4 years ago
    932,915
    372