I have a simple python script set up to scrape the name and image of every post from the men's section of H&M. The names are returned without trouble, but the image urls, seem to only return the first few before resorting to a format of: "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" I've tried requests and selenium with chromedriver separately. What am I missing?
First attempt (requests):
import requests
from bs4 import BeautifulSoup
# URL of the H&M men's section
url = "https://www2.hm.com/en_us/men/products/view-all.html?page=1"
# Headers to mimic a browser visit
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Referer": "https://www.google.com/",
"Connection": "keep-alive"
}
# Send a GET request to the webpage
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code == 200:
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Find all the product items
items = soup.find_all('article', class_='f0cf84')
# Iterate over the items and extract the name and image URL
for item in items:
# Extract the product name
name = item.find('a', class_='db7c79')['title']
# Extract the image URL (the 'src' attribute of the <img> tag)
img_tag = item.find('img', imagetype='PRODUCT_IMAGE')
img_url = img_tag['src'] if img_tag else 'No image'
# Print the name and image URL
print(f"Product Name: {name}")
print(f"Image URL: {img_url}\n")
else:
print(f"Failed to retrieve the page. Status code: {response.status_code}")
Second attempt (selenium)
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
# URL of the H&M men's section
url = "https://www2.hm.com/en_us/men/products/view-all.html?page=1"
# Open the webpage
driver.get(url)
# Get the page source and parse it with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Find all the product items
items = soup.find_all('article', class_='f0cf84')
# Iterate over the items and extract the name and image URL
for item in items:
# Extract the product name
name = item.find('a', class_='db7c79')['title']
# Extract the image URL (the 'src' attribute of the <img> tag)
img_tag = item.find('img', imagetype='PRODUCT_IMAGE')
img_url = img_tag['src'] if img_tag else 'No image'
# Print the name and image URL
print(f"Product Name: {name}")
print(f"Image URL: {img_url}\n")
# Quit the WebDriver
driver.quit()
The response are the same both times as such:
Product Name: Baggy Jeans
Image URL: https://image.hm.com/assets/hm/9e/53/9e53035efef96606bc4b50eaf6a0eee4f08a152c.jpg?imwidth=1536
Product Name: Regular Fit Cotton Shorts
Image URL: https://image.hm.com/assets/hm/8f/d8/8fd8d52f2e2c778041410f9a2727b448053ca8b7.jpg?imwidth=1536
Product Name: Regular Fit Linen-blend Shorts
Image URL: https://image.hm.com/assets/hm/d7/54/d7546a095c04387d1ad98575588c84e0426fb4be.jpg?imwidth=1536
Product Name: Muscle Fit Cotton Shirt
Image URL: https://image.hm.com/assets/hm/c7/d4/c7d49cef60f9d196d2f5347815f416bba7d4b636.jpg?imwidth=1536
Product Name: Slim Fit Ribbed Tank Top
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Slim Fit Jacket
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: 5-pack Slim Fit T-shirts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Linen-blend Resort Shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Slim Fit Suit Pants
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Cotton Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Slim Fit Suit Pants
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Slim Fit Polo Shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Baggy Jeans
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Slim Fit Half-zip Polo Shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Slim Fit Linen Jacket
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Loose Fit Cargo Jeans
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Nylon Cargo Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Loose Fit T-shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Loose Jeans
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Swim Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Chino Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Muscle Fit Polo Shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Linen-blend Pants
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: 5-pack Short Cotton Boxer Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Cropped Cotton Chinos
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Linen-blend Shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Slim Fit Linen Suit Pants
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Linen-blend Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Swim Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Patterned Swim Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Patterned Swim Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit T-shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit T-shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Sweatshorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Regular Fit Cotton Shorts
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
Product Name: Slim Fit T-shirt
Image URL: data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
u can retrieve information from static JSON at the end of the page with requests
if response.status_code == 200:
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Find all the product items
items = json.loads(soup.find('script', {'id': '__NEXT_DATA__'}).get_text())
for item in items['props']['pageProps']['plpProps']['productListingProps']['hits']:
print(f"Product Name: {item['title']}")
print(f"Image URL: {'https://image.hm.com/' + item['imageProductSrc']}\n")
OUTPUT:
Product Name: Baggy Jeans
Image URL: https://image.hm.com/assets/hm/3d/dd/3ddd1d7ee3dece637a88557a759b3502868b6ccd.jpg
Product Name: Regular Fit Cotton Shorts
Image URL: https://image.hm.com/assets/hm/00/79/00792d85a6f093d63513805bb50755be65e625b6.jpg
Product Name: Regular Fit Linen-blend Shorts
Image URL: https://image.hm.com/assets/hm/84/57/8457f6ee69e78e52a6a066f59dab6d416f4755d6.jpg
Product Name: Muscle Fit Cotton Shirt
Image URL: https://image.hm.com/assets/hm/c7/d4/c7d49cef60f9d196d2f5347815f416bba7d4b636.jpg
Product Name: Slim Fit Ribbed Tank Top
Image URL: https://image.hm.com/assets/hm/c1/2a/c12a71d4223049325463e8858352d9c88e5d1590.jpg
Product Name: Slim Fit Jacket
Image URL: https://image.hm.com/assets/hm/19/63/1963f78945f45ae8ea8c7f784c48197d7579675e.jpg
Product Name: 5-pack Slim Fit T-shirts
Image URL: https://image.hm.com/assets/hm/ef/84/ef847140f2084137e9142930801734502ab52ace.jpg
Product Name: Regular Fit Linen-blend Resort Shirt
Image URL: https://image.hm.com/assets/hm/64/5d/645da6d33d00dff6c973409498e0165435c0f35e.jpg
Product Name: Slim Fit Suit Pants
Image URL: https://image.hm.com/assets/hm/9a/11/9a113712bb917e853c24d444d7bf6dda63e84f0b.jpg
Product Name: Regular Fit Cotton Shorts
Image URL: https://image.hm.com/assets/hm/dd/a6/dda65d63ce74413808875fda0348e03878832232.jpg
Product Name: Slim Fit Suit Pants
Image URL: https://image.hm.com/assets/hm/c7/ff/c7ff2ba7d6eca8119908fcd7daf9066d3a8412dd.jpg
Product Name: Slim Fit Polo Shirt
Image URL: https://image.hm.com/assets/hm/fc/d7/fcd760bafdfb48ea8cccde14c5e3ad338dd96bfd.jpg
Product Name: Baggy Jeans
Image URL: https://image.hm.com/assets/hm/bc/fd/bcfd3f19e72773a735a3261355f490f6e2554238.jpg
Product Name: Slim Fit Half-zip Polo Shirt
Image URL: https://image.hm.com/assets/hm/03/3d/033dd2b17620eac8ebdf949c76cdc2b046a6bbd6.jpg
Product Name: Slim Fit Linen Jacket
Image URL: https://image.hm.com/assets/hm/5e/96/5e96bc27780b7002c2d97993b4f94bbfde01d610.jpg
Product Name: Loose Fit Cargo Jeans
Image URL: https://image.hm.com/assets/hm/fc/38/fc382304cef5c2a33a9a5a1b3c8cfc2e2c056f8b.jpg
Product Name: Regular Fit Nylon Cargo Shorts
Image URL: https://image.hm.com/assets/hm/a3/7d/a37db012d5f826763fece602dab3c7d44d8911c0.jpg
Product Name: Loose Fit T-shirt
Image URL: https://image.hm.com/assets/hm/e3/56/e3568a1492d1a9149da0401120fd82357a020eb0.jpg
Product Name: Loose Jeans
Image URL: https://image.hm.com/assets/hm/2c/77/2c77a9ff7cf1bc0cd4f2c2c94c23cff06ea3d555.jpg
Product Name: Swim Shorts
Image URL: https://image.hm.com/assets/hm/53/e6/53e6dbccb7a06a0d875217791b48d5a4c3c1def7.jpg
Product Name: Regular Fit Chino Shorts
Image URL: https://image.hm.com/assets/hm/f6/77/f677a6aab0df3447d0d6f6ab3146b1a78a7b5048.jpg
Product Name: Muscle Fit Polo Shirt
Image URL: https://image.hm.com/assets/hm/20/6f/206fbe107fe2aa85222b7b231e874274bf2421c6.jpg
Product Name: Regular Fit Linen-blend Pants
Image URL: https://image.hm.com/assets/hm/bb/79/bb79892f3ca98c59acdc959168fa7501c686c057.jpg
Product Name: 5-pack Short Cotton Boxer Shorts
Image URL: https://image.hm.com/assets/hm/3a/c7/3ac702fb6c64fe556b5033b0656e87bc64a5f921.jpg
Product Name: Regular Fit Cropped Cotton Chinos
Image URL: https://image.hm.com/assets/hm/94/52/945293e8bca2e00d9498a1250f541e5a372506ad.jpg
Product Name: Regular Fit Linen-blend Shirt
Image URL: https://image.hm.com/assets/hm/d5/9a/d59a0a9ccd4cc6ddeff0ffec007ae718b82e70fe.jpg
Product Name: Slim Fit Linen Suit Pants
Image URL: https://image.hm.com/assets/hm/eb/28/eb28c996f3b65e20bdf182e2d082016e61aa469c.jpg
Product Name: Regular Fit Linen-blend Shorts
Image URL: https://image.hm.com/assets/hm/f8/a8/f8a885cf83338303825afbda849304ba099f2d92.jpg
Product Name: Swim Shorts
Image URL: https://image.hm.com/assets/hm/31/05/3105df2f9e33d9c2c9665a819c0b55eef54e466a.jpg
Product Name: Patterned Swim Shorts
Image URL: https://image.hm.com/assets/hm/cb/e6/cbe6dafb3fb3ab98dbf1e502bf1af24bec4f2b1d.jpg
Product Name: Patterned Swim Shorts
Image URL: https://image.hm.com/assets/hm/74/1a/741a2c7c93a9e266060411d83c0d26435248fa7b.jpg
Product Name: Regular Fit T-shirt
Image URL: https://image.hm.com/assets/hm/44/42/4442fbac4e3080ec20b2f14e353fea267249b0dd.jpg
Product Name: Regular Fit T-shirt
Image URL: https://image.hm.com/assets/hm/bd/e4/bde4ef42f917ccb678c4ff1d218520ce2f10ff6d.jpg
Product Name: Regular Fit Sweatshorts
Image URL: https://image.hm.com/assets/hm/34/54/3454e1358929cdf81bccf06ac6e38372d00807f2.jpg
Product Name: Regular Fit Cotton Shorts
Image URL: https://image.hm.com/assets/hm/a2/a1/a2a105ee22bf93da3b28deb11f9d408b2b0bff4b.jpg
Product Name: Slim Fit T-shirt
Image URL: https://image.hm.com/assets/hm/09/58/0958cc08f86b7127b5dd8e0d0091824a337b6588.jpg