I am trying to scrape a website, but it looks I can't acces all links. The website is:
The procedure I am following is first identify each separate product, and then get the link for each product. To my surprise I can identify all the products in the page, but I can only get the link for the first 8, althogh the others should have a link too. My code is:
from requests_html import HTMLSession
s = HTMLSession()
url = "https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns"
r = s.get(url)
products = r.html.find('ul.product-card-list__list li')
for item in products:
print(item.find('a', first=True).attrs["href"])
At some point I get the following error, since I can't find the link of the product, although it exists and the product seems to be loaded:
AttributeError: 'NoneType' object has no attribute 'attrs'
Any hints about where the problem is? Many thanks!!
Possibly is a js rendering and you are using just downloading the HTML page content. Try use the scrapinghub splash to evaluate, I am from a country where the site is blocked and cannot help much.