I am trying to scrape some data from an esports stats site (vlr.gg). I decided to use BeatifulSoup but I am facing some issues in scraping data from same class names now.
box5=soup.find_all("div",class_="match-header-vs-score")
for p in box5:
matchtdetails=p.find("div",class_="match-header-vs-note").get_text(strip=True)
print(" In: ",matchtdetails)
box6=soup.find_all("div",class_="match-header-vs-note")
for q in box6:
if q.find("div",class_="match-header-vs-note"):
matchdetails1=q.find("div",class_="match-header-vs-note").get_text(strip=True)
print(matchdetails1)
This is the specific HTML code block I am working with here:
<div class="match-header-vs-score">
<div class="match-header-vs-note">
<span class="match-header-vs-note mod-upcoming">18h 24m
</span>
</div>
<div class="match-header-vs-placeholder"> –
</div>
<div class="match-header-vs-note">Bo3
</div>
</div>
I am new to HTML scraping and from what i understood, to derive a specific data I am supposed to point the "box" to the class immediately wrapping the classes of the data I want, and then iterate through the classes in a loop and "find" the specific class I want the data from.
Box5 is being used to derive the Time Remaining stat, box6
is for what format it is(bo1,bo3,etc)
However in this code, box6
does not return me anything.
As for box5
, I was originally trying to use find_all()
on "match-header-vs-note" since that was the immediate class that wrapped "match-header-vs-note mod-upcoming",but it kept giving me NoneType Attribute Error
. I thought the space which denotes multiple classes within was the problem but no, similar classes with spaces in their name worked elsewhere.After I changed the code for box5 as the code I have given above, it works.
Originally my thought process was using find_all()
within the loop and store the datas in a list but as i came to know, it gives the following error:
AttributeError: 'NavigableString' object has no attribute 'find_all'. Did you mean: '_find_all'?
You are close to the solution and have already selected correctly, but you will not find another <div>
with the corresponding class within box6
because you have already positioned yourself on this element.
Try to simplify your approach and use the ResultSet
information as is:
box5,box6 = [e.get_text(strip=True) for e in soup.find_all("div",class_="match-header-vs-note")]
Or select the parent element and work on its text content:
box5,box6 = soup.find("div",class_="match-header-vs-score").get_text(strip=True).split('–')
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('https://www.vlr.gg/326896/leviat-n-gc-vs-firepower-game-changers-2024-latam-south-opening-w5').text)
box5,box6 = [e.get_text(strip=True) for e in soup.find_all("div",class_="match-header-vs-note")]