Search code examples
pythonhtmlweb-scrapingbeautifulsoupurllib3

BeautifulSoup can't find div with specific class


Webpage HTML and my code

So for some background I have been trying to learn web scraping to get some images for machine learning projects involving CNNs. I have been trying to scrape some images from a site (HTML code on the left, my code on the right) with no luck; my code ends up printing/returning an empty list. Is there something I am doing wrong?

For what it's worth, I tried finding other div tags that had an 'id' instead of a 'class' and that did work, but for some reason it can't find the ones I am looking for.

Edit:

import requests
import urllib3
from bs4 import BeautifulSoup

http = urllib3.PoolManager()
url = 'https://www.grailed.com/shop/EkpEBRw4rw'

response = http.request('GET', url)
soup = BeautifulSoup(response.data, 'html.parser')

img_div = soup.findAll('div', {'class': "listing-cover-photo "})
print(img_div)

Edit 2:

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.grailed.com/shop/EkpEBRw4rw'
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')

listing = soup.select('.listing-cover-photo ')
for item in listing:
    print(item.select('img'))

OUTPUT:

[<img alt="Off-White Off White Caravaggio Hoodie" src="https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/cache=expiry:max/rotate=deg:exif/resize=width:480,height:640,fit:crop/output=format:webp,quality:70/compress/https://cdn.fs.grailed.com/api/file/yX8vvvBsTaugadX0jssT"/>]
(...a few more of these...)
[<img alt="Off-White Off-White Arrows Hoodie Black" src="https://process.fs.grailed.com/AJdAgnqCST4iPtnUxiGtTz/cache=expiry:max/rotate=deg:exif/resize=width:480,height:640,fit:crop/output=format:webp,quality:70/compress/https://cdn.fs.grailed.com/api/file/9CMvJoQIRaqgtK0u9ov0"/>]
[]
[]
[]
[]
(...many more empty lists...)

Solution

  • It looks like the website is loading the data using JavaScript.Try use Selenium and beautiful soup.

    from bs4 import BeautifulSoup
    from selenium import webdriver
    
    url = "https://www.grailed.com/shop/EkpEBRw4rw"
    browser = webdriver.Chrome(executable_path="/path/to/chromedriver.exe")
    browser.get(url)
    soup = BeautifulSoup(browser.page_source,"html.parser")
    items=soup.select(".listing-cover-photo ")
    print(items)