Search code examples
pythonpandasweb-scraping

I'm getting an empty dataframe trying to web scrape html code. Why?


Trying to use Python 3.x and pandas to scrape salary data from Basketball-Reference. I'm not getting any error messages, but I have no output. I want the second and fourth columns from the table: 'Player' and salary '2019-20'. What am I doing wrong?

This is what I have so far:

# URL page we will scraping
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text

# this is the HTML from the given URL
soup = BeautifulSoup(html)

#This takes the player salaries data, and creates a list of a lists, where a list is all the values of a player
salaries = []
for x in soup.find_all('tr')[2:]:
    tds_salaries = x.find_all('td')
    name_s = tds_salaries[0].text
    salary = tds_salaries[2].text
    salaries.append([name_s, salary[1:]])

#create a salary pandas dataframe
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])

salaries_df.head()


Solution

  • It worked fine here. All I did was putting a try inside the for loop to skip the table headers.

    Code

    salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
    salaries_response = requests.get(salaries_url)
    page = salaries_response.text
    
    soup = BeautifulSoup(page)
    
    salaries = []
    for x in soup.find_all('tr')[2:]:
        try:
            tds_salaries = x.find_all('td')
            name_s = tds_salaries[0].text
            salary = tds_salaries[2].text
            salaries.append([name_s, salary[1:]])
        except IndexError:
            print('This is a header!')
    
    salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
    
    print(salaries_df)
    

    Outuput

                      name      salary
    0        Stephen Curry  40,231,758
    1    Russell Westbrook  38,506,482
    2           Chris Paul  38,506,482
    3            John Wall  38,199,000
    4         James Harden  38,199,000
    ..                 ...         ...
    570    Hollis Thompson      50,000
    571         Tyler Ulis      50,000
    572  Demetrius Jackson      18,312
    573    Jordan Caroline       6,000
    574    Anthony Bennett       6,000
    
    [575 rows x 2 columns]