Search code examples
python-3.xbeautifulsoupurlopen

BeautifulSoup and urlopen aren't fetching the right table


I'm trying to practice BeautifulSoup and urlopen by using Basketball-Reference datasets. When I try and get individual player's stats, everything works fine, but then I tried to use the same code for Team's stats and apparently urlopen isn't finding the right table.

The following code is to get the "headers" from the page.


def fetch_years():

  #Determine the urls
  url = "https://www.basketball-reference.com/leagues/NBA_2000.html?sr&utm_source=direct&utm_medium=Share&utm_campaign=ShareTool#team-stats-per_game::none"

  html = urlopen(url)

  soup = BeautifulSoup(html)

  soup.find_all('tr')

  headers = [th.get_text() for th in soup.find_all('tr')[0].find_all('th')]
  headers = headers[1:]
  print(headers)

I'm trying to get the Team's stats per game data, in a format like:

['Tm', 'G', 'MP', 'FG', ...]

Instead, the header data I'm getting is:

['W', 'L', 'W/L%', ...] 

which is the very first table in the 1999-2000 season information about the teams (under the name 'Division Standings').

If you use that same code for a player's data such as this one, you get the result I'm looking for:

  Age   Tm   Lg Pos   G  GS    MP   FG  ...  DRB  TRB  AST  STL  BLK  TOV   PF   PTS
0  20  OKC  NBA  PG  82  65  32.5  5.3  ...  2.7  4.9  5.3  1.3  0.2  3.3  2.3  15.3
1  21  OKC  NBA  PG  82  82  34.3  5.9  ...  3.1  4.9  8.0  1.3  0.4  3.3  2.5  16.1
2  22  OKC  NBA  PG  82  82  34.7  7.5  ...  3.1  4.6  8.2  1.9  0.4  3.9  2.5  21.9
3  23  OKC  NBA  PG  66  66  35.3  8.8  ...  3.1  4.6  5.5  1.7  0.3  3.6  2.2  23.6
4  24  OKC  NBA  PG  82  82  34.9  8.2  ...  3.9  5.2  7.4  1.8  0.3  3.3  2.3  23.2

The code to webscrape came originally from here.


Solution

  • the sports -reference.com sites are trickier than your standard ones. The tables are rendered after loading the page (with the exception of a few tables on the pages), so you'd need to use Selenium to let it render first, then pull the html source code.

    However, the other option is if you look at the html source, you'll see those tables are within the comments. You could use BeautifulSoup to pull out the comments tags, then search through those for the table tags.

    This will return a list of dataframes, and the Team Per Game stats are the table in index position 1:

    import requests
    from bs4 import BeautifulSoup
    from bs4 import Comment
    import pandas as pd
    
    def fetch_years():
    
        #Determine the urls
        url = "https://www.basketball-reference.com/leagues/NBA_2000.html?sr&utm_source=direct&utm_medium=Share&utm_campaign=ShareTool#team-stats-per_game::none"
        html = requests.get(url)
    
        soup = BeautifulSoup(html.text)
        comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
        tables = []
        for each in comments:
            if 'table' in each:
                try:
                    tables.append(pd.read_html(each)[0])
                except:
                    continue
        return tables
    
    tables = fetch_years()
    

    Output:

    print (tables[1].to_string())
          Rk                     Team   G     MP    FG   FGA    FG%   3P   3PA    3P%    2P   2PA    2P%    FT   FTA    FT%   ORB   DRB   TRB   AST  STL  BLK   TOV    PF    PTS
    0    1.0        Sacramento Kings*  82  241.5  40.0  88.9  0.450  6.5  20.2  0.322  33.4  68.7  0.487  18.5  24.6  0.754  12.9  32.1  45.0  23.8  9.6  4.6  16.2  21.1  105.0
    1    2.0         Detroit Pistons*  82  241.8  37.1  80.9  0.459  5.4  14.9  0.359  31.8  66.0  0.481  23.9  30.6  0.781  11.2  30.0  41.2  20.8  8.1  3.3  15.7  24.5  103.5
    2    3.0         Dallas Mavericks  82  240.6  39.0  85.9  0.453  6.3  16.2  0.391  32.6  69.8  0.468  17.2  21.4  0.804  11.4  29.8  41.2  22.1  7.2  5.1  13.7  21.6  101.4
    3    4.0          Indiana Pacers*  82  240.6  37.2  81.0  0.459  7.1  18.1  0.392  30.0  62.8  0.478  19.9  24.5  0.811  10.3  31.9  42.1  22.6  6.8  5.1  14.1  21.8  101.3
    4    5.0         Milwaukee Bucks*  82  242.1  38.7  83.3  0.465  4.8  13.0  0.369  33.9  70.2  0.483  19.0  24.2  0.786  12.4  28.9  41.3  22.6  8.2  4.6  15.0  24.6  101.2
    5    6.0      Los Angeles Lakers*  82  241.5  38.3  83.4  0.459  4.2  12.8  0.329  34.1  70.6  0.482  20.1  28.9  0.696  13.6  33.4  47.0  23.4  7.5  6.5  13.9  22.5  100.8
    6    7.0            Orlando Magic  82  240.9  38.6  85.5  0.452  3.6  10.6  0.338  35.1  74.9  0.468  19.2  26.1  0.735  14.0  31.0  44.9  20.8  9.1  5.7  17.6  24.0  100.1
    7    8.0          Houston Rockets  82  241.8  36.6  81.3  0.450  7.1  19.8  0.358  29.5  61.5  0.480  19.2  26.2  0.733  12.3  31.5  43.8  21.6  7.5  5.3  17.4  20.3   99.5
    8    9.0           Boston Celtics  82  240.6  37.2  83.9  0.444  5.1  15.4  0.331  32.2  68.5  0.469  19.8  26.5  0.745  13.5  29.5  43.0  21.2  9.7  3.5  15.4  27.1   99.3
    9   10.0     Seattle SuperSonics*  82  241.2  37.9  84.7  0.447  6.7  19.6  0.339  31.2  65.1  0.480  16.6  23.9  0.695  12.7  30.3  43.0  22.9  8.0  4.2  14.0  21.7   99.1
    10  11.0           Denver Nuggets  82  242.1  37.3  84.3  0.442  5.7  17.0  0.336  31.5  67.2  0.469  18.7  25.8  0.724  13.1  31.6  44.7  23.3  6.8  7.5  15.6  23.9   99.0
    11  12.0            Phoenix Suns*  82  241.5  37.7  82.6  0.457  5.6  15.2  0.368  32.1  67.4  0.477  17.9  23.6  0.759  12.5  31.2  43.7  25.6  9.1  5.3  16.7  24.1   98.9
    12  13.0  Minnesota Timberwolves*  82  242.7  39.3  84.3  0.467  3.0   8.7  0.346  36.3  75.5  0.481  16.8  21.6  0.780  12.4  30.1  42.5  26.9  7.6  5.4  13.9  23.3   98.5
    13  14.0       Charlotte Hornets*  82  241.2  35.8  79.7  0.449  4.1  12.2  0.339  31.7  67.5  0.469  22.7  30.0  0.758  10.8  32.1  42.9  24.7  8.9  5.9  14.7  20.4   98.4
    14  15.0          New Jersey Nets  82  241.8  36.3  83.9  0.433  5.8  16.8  0.347  30.5  67.2  0.454  19.5  24.9  0.784  12.7  28.2  40.9  20.6  8.8  4.8  13.6  23.3   98.0
    15  16.0  Portland Trail Blazers*  82  241.2  36.8  78.4  0.470  5.0  13.8  0.361  31.9  64.7  0.493  18.8  24.7  0.760  11.8  31.2  43.0  23.5  7.7  4.8  15.2  22.7   97.5
    16  17.0         Toronto Raptors*  82  240.9  36.3  83.9  0.433  5.2  14.3  0.363  31.2  69.6  0.447  19.3  25.2  0.765  13.4  29.9  43.3  23.7  8.1  6.6  13.9  24.3   97.2
    17  18.0      Cleveland Cavaliers  82  242.1  36.3  82.1  0.442  4.2  11.2  0.373  32.1  70.9  0.453  20.2  26.9  0.750  12.3  30.5  42.8  23.7  8.7  4.4  17.4  27.1   97.0
    18  19.0       Washington Wizards  82  241.5  36.7  81.5  0.451  4.1  10.9  0.376  32.6  70.6  0.462  19.1  25.7  0.743  13.0  29.7  42.7  21.6  7.2  4.7  16.1  26.2   96.6
    19  20.0               Utah Jazz*  82  240.9  36.1  77.8  0.464  4.0  10.4  0.385  32.1  67.4  0.476  20.3  26.2  0.773  11.4  29.6  41.0  24.9  7.7  5.4  14.9  24.5   96.5
    20  21.0       San Antonio Spurs*  82  242.1  36.0  78.0  0.462  4.0  10.8  0.374  32.0  67.2  0.476  20.1  27.0  0.746  11.3  32.5  43.8  22.2  7.5  6.7  15.0  20.9   96.2
    21  22.0    Golden State Warriors  82  240.9  36.5  87.1  0.420  4.2  13.0  0.323  32.3  74.0  0.437  18.3  26.2  0.697  15.9  29.7  45.6  22.6  8.9  4.3  15.9  24.9   95.5
    22  23.0      Philadelphia 76ers*  82  241.8  36.5  82.6  0.442  2.5   7.8  0.323  34.0  74.8  0.454  19.2  27.1  0.708  14.0  30.1  44.1  22.2  9.6  4.7  15.7  23.6   94.8
    23  24.0              Miami Heat*  82  241.8  36.3  78.8  0.460  5.4  14.7  0.371  30.8  64.1  0.481  16.4  22.3  0.736  11.2  31.9  43.2  23.5  7.1  6.4  15.0  23.7   94.4
    24  25.0            Atlanta Hawks  82  241.8  36.6  83.0  0.441  3.1   9.9  0.317  33.4  73.1  0.458  18.0  24.2  0.743  14.0  31.3  45.3  18.9  6.1  5.6  15.4  21.0   94.3
    25  26.0      Vancouver Grizzlies  82  242.1  35.3  78.5  0.449  4.0  11.0  0.361  31.3  67.6  0.463  19.4  25.1  0.774  12.3  28.3  40.6  20.7  7.4  4.2  16.8  22.9   93.9
    26  27.0         New York Knicks*  82  241.8  35.3  77.7  0.455  4.3  11.4  0.375  31.0  66.3  0.468  17.2  22.0  0.781   9.8  30.7  40.5  19.4  6.3  4.3  14.6  24.2   92.1
    27  28.0     Los Angeles Clippers  82  240.3  35.1  82.4  0.426  5.2  15.5  0.339  29.9  67.0  0.446  16.6  22.3  0.746  11.6  29.0  40.6  18.0  7.0  6.0  16.2  22.2   92.0
    28  29.0            Chicago Bulls  82  241.5  31.3  75.4  0.415  4.1  12.6  0.329  27.1  62.8  0.432  18.1  25.5  0.709  12.6  28.3  40.9  20.1  7.9  4.7  19.0  23.3   84.8
    29   NaN           League Average  82  241.5  36.8  82.1  0.449  4.8  13.7  0.353  32.0  68.4  0.468  19.0  25.3  0.750  12.4  30.5  42.9  22.3  7.9  5.2  15.5  23.3   97.5