Search code examples
pythonweb-scrapingbeautifulsouphref

Scraping HREF Links contained within a Table


I've been bouncing around a ton of similar questions, but nothing that seems to fix the issue... I've set this up (with help) to scrape the HREF tags from a different URL.

I'm trying to now take the HREF links in the "Result" column from this URL.

here

The script doesn't seem to be working like it did for other sites.

The table is an HTML element, but no matter how I tweak my script, I can't retrieve anything except a blank result.

Could someone explain to me why this is the case? I'm watching many YouTube videos trying to understand, but this just doesn't make sense to me.

import requests
from bs4 import BeautifulSoup

    profiles = []
    urls = [
        'https://stats.ncaa.org/player/game_by_game?game_sport_year_ctl_id=15881&id=15881&org_id=6&stats_player_seq=-100'
    
    
    ]
    for url in urls:
        req = requests.get(url)
        soup = BeautifulSoup(req.text, 'html.parser')
        for profile in soup.find_all('a'):
    
            profile = profile.get('href')
    
            profiles.append(profile)
    
    print(profiles)


Solution

  • The following code works:

    import requests
    from bs4 import BeautifulSoup
    
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.60 Safari/537.17'}
    
    r = requests.get('https://stats.ncaa.org/player/game_by_game?game_sport_year_ctl_id=15881&id=15881&org_id=6&stats_player_seq=-100', headers=headers)
    soup = BeautifulSoup(r.text, 'html.parser')
    for x in soup.select('a'):
        print(x.get('href'))