I am new to scraping/coding and could use some help if possible.
from bs4 import BeautifulSoup
import requests
import pandas as pd
page_link ='https://www.baseball-reference.com/previews/index.shtml'
page_response = requests.get(page_link, timeout=5)
soup = BeautifulSoup(page_response.content, "html.parser")
I need help finding the appropriate way to find to extract the pitcher's name and team.
(examples only:)
player_name = [i.text for i in soup.find_all('td', {'href': 'example-name'})]
team = [i.text for i in soup.find_all('td', {'href': 'example-team'})]
Here is where I export to excel:
my_dict = dict(zip(player_name, team))
df = pd.DataFrame(pd.Series(my_dict))
writer = pd.ExcelWriter('pitching_webscrape.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
I would like the pitcher's name and team imported to excel. Thanks in advance for your help! Please let me know if I can improve my question or add more details.
Here is the code I had so far:
from bs4 import BeautifulSoup
import requests
import pandas as pd
page_link ='https://www.baseball-reference.com/previews/index.shtml'
page_response = requests.get(page_link, timeout=5)
soup = BeautifulSoup(page_response.content, "html.parser")
My first code:
t = soup.find_all('td')
print(t)
My first output:
[Blue Jays (60-70) , ,
Preview
, Orioles (37-94) , , 7:05PM
, TOR, Sam Gaviglio
(#43, 28, RHP, 3-6, 4.94), BAL, David Hess
(#41, 24, RHP, 2-8, 5.50), White Sox (51-79) , ,
My second code:
t = soup.find_all('td')
for a in t:
print(a.text)
My second output:
Blue Jays (60-70)
Preview
Orioles (37-94)
7:05PM
TOR Sam Gaviglio(#43, 28, RHP, 3-6, 4.94) BAL David Hess(#41, 24, RHP, 2-8, 5.50) White Sox (51-79)
I am getting closer,however, I only want the player's names and team's names. (i.e. TOR, Sam Gaviglio). I also want this imported into excel. Thanks! =)
If you just want a single list
of players and teams, then this should suffice:
import re
players_and_teams = []
for i in soup.find_all('td'):
if i.find_all('a'):
for link in i.find_all('a'):
if not re.findall(r'Preview',link.text):
players_and_teams.append(link.text)