Search code examples
pythonweb-scrapingpython-requestscss-selectorspython-requests-html

Why does requests_html return an empty list, although the selector or XPATH are correct?


I am trying to scrape data from AliExpress using requests_html and CSS selectors, but it always returns an empty list. Can you please help?

The code I used:

import time

from requests_html import HTMLSession


url = 'https://www.aliexpress.com/w/wholesale-test.html?catId=0&initiative_id=SB_20230516115154&SearchText=test&spm=a2g0o.home.1000002.0'


def create_session(url):
    session = HTMLSession()
    request = session.get(url)
    request.html.render(sleep = 25) #Because it is dynamic website, will wait until to load the page
    prod = request.html.find(
        '#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y > div.list--gallery--34TropR')
    print(prod)

create_session(url)

The output:

[]

Please note that I tried to change the CSS selector as below,, and I always got an an empty list:

1: I tried: prod = request.html.find('#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y > div.list--gallery--34TropR')

2- I tried: prod = request.html.find('#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y > div.list--gallery--34TropR > a:nth-child(1)')

3- I tried: prod = request.html.find('#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y > div.list--gallery--34TropR > a:nth-child(n)')

4: I tried: prod = request.html.find('#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y > div.list--gallery--34TropR > a')

ried: prod = request.html.find('#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y')

6: I tried: prod = request.html.find('div.manhattan--container--1lP57Ag.cards--gallery--2o6yJVt')

7: I tried: prod = request.html.find('a.manhattan--container--1lP57Ag.cards--gallery--2o6yJVt')

8- I tried: prod = request.html.find('div.list--gallery--34TropR')

and also got an an empty list. Can you help,, please?

I note that sometimes it works and sometimes returns empty list


Solution

  • I finally solved the issue by increasing the rendering time