My goal is to parse images from second page. I am using bf4 and Python3 for this. Please, look at those two pages:
1) Only page with images for all 4 colors (I can parse this page);
2) And page which contain images only for 1 color (chrom color in this example). I need to parse this page.
Using browser I can see that second page different from the first one. But, using bs4 I got similar results for first and second page as python didn't recognize this ".html#/kolor-chrom" in second page address.
First page address: "https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html".
Second page address: "https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html#/kolor-chrom".
Code to reproduce:
from bs4 import BeautifulSoup
import requests
adres1 = "https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html"
adres2 = "https://azzardo.com.pl/lampy-techniczne/2111-bross-1-tuba-lampa-techniczna-azzardo.html#/kolor-chrom"
def parse_one_page(adres):
"""Parse one page and get all the img src from adres"""
# Use headers to prevent hide our script
headers = {'User-Agent': 'Mozilla/5.0'}
# Get page
page = requests.get(adres, headers=headers) # read_timeout=5
# Get all of the html code
soup = BeautifulSoup(page.content, 'html.parser')
# Find div
divclear = soup.find_all("div", class_="clearfix")
divclear = divclear[9]
# Find img tag
imgtag = [i.find_all("img") for i in divclear][0]
# Find src
src = [i["src"] for i in imgtag]
# See how much images are here
print(len(src))
# return list with img src
return src
print(parse_one_page(adres1))
print(parse_one_page(adres2))
After running those code you will see that output from those two addresses are similar: 24 images from both adresses. In first page here are 24 images (that's correct). But in second page here must be only 2 images, not 24 (incorrect)!
So hope, that someone help me how to parse second page in python3 using bs4 correctly.
Yep, looks like it's not possible to parse such responsive page using bs4