Search code examples
pythonwebpython-requestsscreen-scrapingpython-requests-html

Requests-html not getting all links


I am trying to scrape a website, but it looks I can't acces all links. The website is:

https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns

The procedure I am following is first identify each separate product, and then get the link for each product. To my surprise I can identify all the products in the page, but I can only get the link for the first 8, althogh the others should have a link too. My code is:

from requests_html import HTMLSession
    
s = HTMLSession()

url = "https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns"
r = s.get(url)

products = r.html.find('ul.product-card-list__list li')


for item in products:
    print(item.find('a', first=True).attrs["href"])

At some point I get the following error, since I can't find the link of the product, although it exists and the product seems to be loaded:

AttributeError: 'NoneType' object has no attribute 'attrs'

Any hints about where the problem is? Many thanks!!


Solution

  • Possibly is a js rendering and you are using just downloading the HTML page content. Try use the scrapinghub splash to evaluate, I am from a country where the site is blocked and cannot help much.

    enter image description here