Trying to use Python 3.x and pandas to scrape salary data from Basketball-Reference. I'm not getting any error messages, but I have no output. I want the second and fourth columns from the table: 'Player' and salary '2019-20'. What am I doing wrong?
This is what I have so far:
# URL page we will scraping
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text
# this is the HTML from the given URL
soup = BeautifulSoup(html)
#This takes the player salaries data, and creates a list of a lists, where a list is all the values of a player
salaries = []
for x in soup.find_all('tr')[2:]:
tds_salaries = x.find_all('td')
name_s = tds_salaries[0].text
salary = tds_salaries[2].text
salaries.append([name_s, salary[1:]])
#create a salary pandas dataframe
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
salaries_df.head()
It worked fine here. All I did was putting a try inside the for loop to skip the table headers.
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text
soup = BeautifulSoup(page)
salaries = []
for x in soup.find_all('tr')[2:]:
try:
tds_salaries = x.find_all('td')
name_s = tds_salaries[0].text
salary = tds_salaries[2].text
salaries.append([name_s, salary[1:]])
except IndexError:
print('This is a header!')
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
print(salaries_df)
name salary
0 Stephen Curry 40,231,758
1 Russell Westbrook 38,506,482
2 Chris Paul 38,506,482
3 John Wall 38,199,000
4 James Harden 38,199,000
.. ... ...
570 Hollis Thompson 50,000
571 Tyler Ulis 50,000
572 Demetrius Jackson 18,312
573 Jordan Caroline 6,000
574 Anthony Bennett 6,000
[575 rows x 2 columns]