Search code examples
htmlclassweb-scrapingextract

Extract info from div class grid with same span


In a dataframe prior to this, I have hundreds of urls that I am iterating to get the same information out of all of them.

Right now, I'm trying to scrape this div class grid so that I can get the row name on one side and the result on the other. Then, it would be to convert it into a dataframe.

I've tried to do it by positions but I can't because the row names change between elements and they are not always the same.

I've tried something like this but it doesn't get me anywhere. It's the wrong approach

principalTable = soup.find_all('div', attrs = {'class': 'info-table info-table--right-space'})
    areas = soup.find_all('span', attrs={'class': 'info-table__content info-table__content--regular'})
    for i in areas:
        listPrueba.append(i.text.strip())
    results = soup.find_all('span', attrs={'class': 'info-table__content info-table__content--bold'})
    for i in results:
        listPrueba2.append(i.text.strip())

The web is this for example: "https://www.transfermarkt.com/ederson/profil/spieler/238223"

And the HTML code is the following:

enter image description here


Solution

  • To get player's personal data into a Pandas dataframe you can use next example:

    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    
    url = "https://www.transfermarkt.com/ederson/profil/spieler/238223"
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0"
    }
    soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
    
    data = iter(soup.select(".spielerdatenundfakten .info-table__content"))
    
    all_data = []
    for a, b in zip(data, data):
        all_data.append(
            [a.get_text(strip=True), b.get_text(strip=True, separator=" ")]
        )
    
    df = pd.DataFrame(all_data, columns=["Key", "Value"])
    print(df.to_markdown(index=False))
    

    Prints:

    Key Value
    Full name: Ederson Santana de Moraes
    Date of birth: Aug 17, 1993
    Place of birth: Osasco (SP)
    Age: 28
    Height: 1,88 m
    Citizenship: Brazil Portugal
    Position: Goalkeeper
    Foot: left
    Player agent: Gestifute
    Current club: Manchester City
    Joined: Jul 1, 2017
    Contract expires: Jun 30, 2026
    Date of last contract extension: Sep 1, 2021
    Outfitter: Puma
    Social-Media: