In a dataframe prior to this, I have hundreds of urls that I am iterating to get the same information out of all of them.
Right now, I'm trying to scrape this div class grid so that I can get the row name on one side and the result on the other. Then, it would be to convert it into a dataframe.
I've tried to do it by positions but I can't because the row names change between elements and they are not always the same.
I've tried something like this but it doesn't get me anywhere. It's the wrong approach
principalTable = soup.find_all('div', attrs = {'class': 'info-table info-table--right-space'})
areas = soup.find_all('span', attrs={'class': 'info-table__content info-table__content--regular'})
for i in areas:
listPrueba.append(i.text.strip())
results = soup.find_all('span', attrs={'class': 'info-table__content info-table__content--bold'})
for i in results:
listPrueba2.append(i.text.strip())
The web is this for example: "https://www.transfermarkt.com/ederson/profil/spieler/238223"
And the HTML code is the following:
To get player's personal data into a Pandas dataframe you can use next example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.com/ederson/profil/spieler/238223"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
data = iter(soup.select(".spielerdatenundfakten .info-table__content"))
all_data = []
for a, b in zip(data, data):
all_data.append(
[a.get_text(strip=True), b.get_text(strip=True, separator=" ")]
)
df = pd.DataFrame(all_data, columns=["Key", "Value"])
print(df.to_markdown(index=False))
Prints:
Key | Value |
---|---|
Full name: | Ederson Santana de Moraes |
Date of birth: | Aug 17, 1993 |
Place of birth: | Osasco (SP) |
Age: | 28 |
Height: | 1,88 m |
Citizenship: | Brazil Portugal |
Position: | Goalkeeper |
Foot: | left |
Player agent: | Gestifute |
Current club: | Manchester City |
Joined: | Jul 1, 2017 |
Contract expires: | Jun 30, 2026 |
Date of last contract extension: | Sep 1, 2021 |
Outfitter: | Puma |
Social-Media: |