Search code examples
pythonpandasscreen-scraping

Scraping data from basketball reference and it is not looping through full url


The code only loops up to this point in the url 'https://www.basketball-reference.com/teams/{0}' and nothing after, so it is grabbing the incorrect data on an incorrect url

team_abbrev = pd.read_csv(r'C:\Users\micha\OneDrive\Desktop\NBA\team_abbreviations.csv')



for i in team_abbrev:
    url = ('https://www.basketball-reference.com/teams/{0}/2022/gamelog-advanced/#tgl_advanced').format(i)

    team_perf = pd.read_html(url)[0]

Solution

  • You aren't iterating through the rows in your .csv or pd dataframe. First you need to load your csv into your dataframe, then you need to iterate through that dataframe:

    def baskiceball():
    
        filename = 'C:/Users/Me/Desktop/teams.csv'
        df = pd.read_csv(filename)
        for index, row in df.iterrows():
            for x in range(0, len(row)):
                url = f'https://www.basketball-reference.com/teams/{row[x]}/2022/gamelog-advanced/#tgl_advanced'
                r = requests.get(url)
                data = r.status_code
                print(f"{row[x]}" + " | " + f"{data}")
    baskiceball()
    

    My teams.csv document has the team abbreviations in a single column:

    team_abbreviation
    SAC
    GSW 
    

    You plug row[x] into the query string

    You make the request r = requests.get(url)

    You read the request. In this instance I went with r.status_code since the url doesn't return json and I just wanted to show that it works. The result:

    SAC | 200
    GSW | 200