Search code examples
pythonbeautifulsoupscreen-scraping

Not scraping website data with beautifulsoup


It is the 3rd or 4th time that I am using BeautifulSoup. I am using it alongside requests lib to scrape data from a sports' website. I am trying to scrape athletes' info such as name, age, height, etc. However, when trying to get the info(print(player_name)) I am getting this instead of what is displayed in the website page:

Name:{{details.player.person.lastName}}, {{details.player.person.firstName}}

Is there any way of accessing the real data?

My code :

import requests
from bs4 import BeautifulSoup

def scrape_player(player_url):

    response_player = requests.get(player_url)
    player_soup = BeautifulSoup(response_player.text, 'html.parser')
    div = player_soup.find('div', {'class' : 'player-info-row'})
    player_name = div.text
    print(player_name)
    


if __name__ == '__main__':
     scrape_player('https://ehfcl.eurohandball.com/men/20212/player/LFpFsiLDFvxs_tXnKlFAQw/luis-frade/')

Solution

  • Website loads data from script tags so its dynamic loaded and bs4 will not able to caputer via tags or class but although it is present in script tag

    import requests
    from bs4 import BeautifulSoup
    url = "https://ehfcl.eurohandball.com/men/2021-22/player/Z8PG_QqFxhA-6PTQ4gcCSA/stas-skube/"
    r = requests.get(url)
    soup = BeautifulSoup(r.content, "html.parser")
    

    Here we can find script tag and load data into json format which returns data as key value pair and you can extract what so data you want!

    data=soup.find("script",attrs={"type":"application/ld+json"})
    
    import json
    main_data=json.loads(data.string)
    
    print(main_data['name'])
    print(main_data['birthDate'])
    

    Output:

    Skube Stas
    1989-11-15