Search code examples
pythonhtmlxmlhttprequestjupyter-labhttpexception

NFL Web Scraper HELP: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load


I am new to coding and need some assistance. I am trying to make a web scraper for a project that involves scraping NFL roster data from 2000 to 2023 but am getting an error requesting the html. I am using Jupyter labs (Python-Pyodide) to write my code and this is the only code I have:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from io import StringIO

years = list(range(2000, 2024))
url = 'https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023'
data = requests.get(url)

This is the error I'm getting:

(JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023'.)

Can you explain why I am getting this error and how do i fix it?


Solution

  • You didn't specify the request headers. But this page doesnt have table tags, so u cant use pd.read_html

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    
    url = "https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023"
    headers = {
      'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
      'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
    }
    result = []
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    table = soup.find('div', class_='divtable divtable-striped divtable-mobile')
    table_head = [head.get_text() for head in table.find('div', class_='thead')]
    for s in table.find_all('span', class_='visible-xs-inline'):
        s.extract()
    for row in table.find_all('div', class_='tr'):
        result.append(dict(zip(table_head, [cell.get_text() for cell in row.find_all('div', class_='td')])))
    df = pd.DataFrame(result)
    print(df)
    

    OUTPUT:

         #            Player Pos   G  GS Age            College
    0   82   Andre Baccellia  WR   5   0  26         Washington
    1    3       Budda Baker  DB  12  12  27         Washington
    2   96        Eric Banks  DE   2   0  25  Texas-San Antonio
    3   51       Krys Barnes  LB  16   6  25               UCLA
    4   66    Jackson Barton  OT   1   0  28               Utah
    ..  ..               ...  ..  ..  ..  ..                ...
    73  21  Garrett Williams  DB   9   6  22           Syracuse
    74  27     Divaad Wilson  DB   2   1  23    Central Florida
    75  20      Marco Wilson  DB  15  11  24            Florida
    76  14    Michael Wilson  WR  13  12  23           Stanford
    77  10        Josh Woods  LB  11   7  27           Maryland