Search code examples
pythonweb-scrapingpython-requestspython-requests-html

Scraping JS rendered page using Requests_HTML is not working as expected


I am working on Scraping a JS rendered page ( https://www.flipkart.com/search?q=Acer+Laptops ). In this page the product images are being loaded dynamically. The pre-rendered SRC values for these images is

//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg

After rendering, the SRC should be something like this

https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70

Using requests_html I can get the SRC values BUT it is only working for the first few images at the top. Please help me out here? My code :-

res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")
res.html.render()
all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results
items = all_results.find('._1UoZlX') # Container for each product being displayed
for item in items:
   item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')
   print(item_image)

Output:-

https://rukminim1.flixcart.com/image/312/312/kamtsi80/computer/m/8/y/acer-na-gaming-laptop-original-imafs5prytwgrcyf.jpeg?q=70
https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg

As you can see the first two images are loaded, the rest are not. Thank you all in advance!


Solution

  • I found the solution, as the images were lazily loaded I had to use "scrolldown" and "sleep" parameters in the "render()" function. Find the code below:

    res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")
    res.html.render(scrolldown=20, sleep=.1)
    all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results
    items = all_results.find('._1UoZlX') # Container for each product being displayed
    for item in items:
       item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')
       print(item_image)