Search code examples
pythonpandasbeautifulsouppython-requestspython-requests-html

How to scrape specified div tag from HTML code using Pandas?


Hate asking this because I've seen a lot of similar issues, but nothing that really guides me to the light unfortunately.

I'm trying to scrape the game story from this link (ultimately I want to build in the ability to do multiple links, but hoping I can handle that)

Story to scrape

It seems like I should be able to take the div story tag, right? I am struggling through how to do that - I've tried various codes I've found online & have tried to tweak, but nothing really applies.

I found this amazing source which really taught me a lot about how to do this.

However, I'm still struggling - this is my code thus far:

import pandas as pd
    
import requests
from bs4 import BeautifulSoup
    
url = 'https://www.nba.com/game/bkn-vs-phi-0022100993'
    
html = requests.get(url)
    
soup = BeautifulSoup(html.text, 'html.parser')
    
# print(soup.prettify())
    
story = soup.find_all('div', {'id': 'story'})
    
print (story)

I'm really trying to learn this, not just copy and paste, in my mind right now this says (in English):

  1. Import packages needed
  2. Get URL HTML Text Data ---- (when I printed the complete code it worked fine)
  3. Narrow down the HTML code to only include div tags labeled as "story" -- this obviously is the hiccup

Struggling to understand, going to keep playing with this but figured I'd turn here for some advice - any thoughts are greatly appreciated. Just getting blank result right now;


Solution

  • Page is being rendered by javascript, which requests cannot execute, so the info (which is being pulled down by the original requests) remains in its incipient state, within the script tag.

    This is one way to get that story with requests:

    import requests
    from bs4 import BeautifulSoup as bs
    import json 
    
    headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
    }
    
    url = 'https://www.nba.com/game/bkn-vs-phi-0022100993'
    
    r = requests.get(url, headers=headers)
    
    soup = bs(r.text, 'html.parser')
    
    page_obj = soup.select_one('script#__NEXT_DATA__')
    json_obj = json.loads(page_obj.text)
    print('Title:', json_obj['props']['pageProps']['story']['header']['headline'])
    print('Date:', json_obj['props']['pageProps']['story']['date'])
    print('Content:', json_obj['props']['pageProps']['story']['content'])
    

    Result printed in terminal:

    Title: Durant, Nets rout 76ers in Simmons' return to Philadelphia
    Date: 2022-03-11T00:30:27
    Content: ['PHILADELPHIA (AP)  The 76ers fans came to boo Ben Simmons. They left booing their own team.', "Kevin Durant scored 18 of his 25 points in Brooklyn's dominating first half in the Nets' 129-100 blowout victory over the 76ers on Thursday night in Simmons' much-hyped return to Philadelphia.", 'Seth Curry added 24 points, and Kyrie Irving had 22 for the Nets. They entered in eighth place in the East, but looked like a legitimate conference contender while badly outplaying the third-place 76ers.', 'Joel Embiid had 27 points and 12 rebounds for the 76ers, and James Harden finished with just 11 points. It was the first loss for Philadelphia in six games with Harden in the lineup.', "The game was dubbed as ''Boo Ben'' night, but the raucous fans instead turned their displeasure on the home team when the 76ers went to the locker room trailing 72-51 and again when Brooklyn built a stunning 32-point lead in the third quarter.", "''I think all of us look at Ben as our brother,'' Durant said. ''We knew this was a hostile environment. It's hard to chant at Ben Simmons when you're losing by that much.''", 'Simmons, wearing a designer hockey jersey and flashy jewelry, watched from the bench, likely taking delight in the vitriol deflected away from him. The three-time All-Star is continuing to recover from a back injury that has sidelined him since being swapped for Harden in a blockbuster deal at the trade deadline.', "''We definitely felt like Ben was on our heart,'' Irving said. ''If you come at Ben, you come at us.''", "While Simmons hasn't taken the floor yet, Harden had been a boon for the 76ers unlike his time in Brooklyn, where the so-called Big 3 of Harden, Durant and Irving managed to play just 16 games together following Harden's trade to Brooklyn last January that was billed as a potentially championship move. Harden exchanged fist bumps with Nets staff members just before tip before a shockingly poor performance from the 10-time All-Star and former MVP.", 'Harden missed 14 of 17 field-goal attempts.', "''We just didn't have the pop that we needed,'' Harden said.", 'The only shot Simmons took was a dunk during pregame warmups that drew derisive cheers from the Philly fans.', "The boos started early, as Simmons was met with catcalls while boarding the team bus to shootaround from the Nets' downtown hotel. Simmons did oblige one fan for an autograph, with another being heard on a video widely circulated on social media yelling, ''Why the grievance? Why spit in the face of Sixers fans? We did nothing but support you for five years, Ben. You know that.''", "The heckling continued when Simmons was at the arena. He entered the court 55 minutes prior to tip, wearing a sleeveless Nets warmup shirt and sweats and spent 20 minutes passing for Patty Mills' warmup shots. He didn't embrace any of his former teammates, though he did walk the length of the court to hug a 76ers official and then exchanged fist pumps with coach Doc Rivers at halftime.", "''Looked good to me, looked happy to be here,'' Nets coach Steve Nash said. ''I think he was happy to get it out of the way.''", "A large security presence closely watched the crowd and cell phones captured every Simmons move. By the end of the game, though, many 76ers fans had left and the remaining Nets fans were chanting: ''BEN SIM-MONS! BEN SIM-MONS!'' in a remarkable turnaround from the start of the evening.", 'WELCOME BACK', 'Former 76ers Curry and Andre Drummond, who also were part of the Simmons for Harden trade, were cheered during introductions, Curry made 10 of 14 shots, including 4 of 8 from 3-point range. Drummond had seven points and seven rebounds.', 'MOVE OVER, REGGIE', "Harden passed Reggie Miller for third on the NBA's 3-point list when he made his 2,561st trey with 6:47 left in the first quarter.", "TRAINER'S ROOM", 'Nets: LaMarcus Aldridge (hip) missed his second straight contest.', "76ers: Danny Green sat out after injuring his left middle finger in the first half of the 76ers' 121-106 win over the Bulls on Monday.", 'TIP-INS', 'Nets: Improved to 21-15 on the road, where Irving is only allowed to play due to his vaccination status. ... Durant also had 14 rebounds and seven assists.', "76ers: Paul Millsap returned after missing Monday's game against Chicago due to personal reasons but didn't play. . Former Sixers and Hall of Famers Allen Iverson and Julus ''Dr. J'' Erving were in attendance. Erving rang the ceremonial Liberty Bell before the contest.", 'UP NEXT', 'Nets: Host New York on Sunday.', '76ers: At Orlando on Sunday.', '---', 'More AP NBA: https://apnews.com/hub/NBA and https://twitter.com/AP-Sports']
    

    Requests documentation: https://requests.readthedocs.io/en/latest/

    Also, for BeautifulSoup: https://beautiful-soup-4.readthedocs.io/en/latest/index.html