Search code examples
pythonpython-requests-html

requests_html render scrolldown, script not working


I need to crawl data from the website that data are loaded by scroll down. The website returned 5 data before scrolling down, and expected 80 data returned after scrolling down are done. I'm using the requests_html module and tried this

from requests_html import  HTML, HTMLSession

keyword = '유산균'
n = 1
url = f'https://search.shopping.naver.com/search/all?frm=NVSHATC&origQuery={keyword}&pagingIndex={n}&pagingSize=80&productSet=total&query={keyword}&sort=rel&timestamp=&viewType=list'

session = HTMLSession()
ses = session.get(url)
html = HTML(html=ses.text)

item_list = html.find('div.basicList_title__3P9Q7')
print(len(item_list))

ses.html.render(scrolldown=100, sleep=.1)

'''
ses.html.render(script="window.scrollTo(0, 99999)", sleep= 10) 
also tried not worked either
'''

print(len(item_list))

I expected 5, 80 as the result but both print returned the same result. 5 and 5.

what is wrong with my code?


Solution

  • When you monitor the network activity when loading the site, you'll see that it loads the search results from an api. This means that you can retrieve the data directly from the api without scraping. Here is an example that loads the first page as a pandas dataframe:

    import requests
    import pandas as pd
    
    keyword = '유산균'
    n = 1
    r = requests.get(f'https://search.shopping.naver.com/api/search/all?sort=rel&pagingIndex={n}&pagingSize=80&viewType=list&productSet=total&deliveryFee=&deliveryTypeValue=&frm=NVSHATC&query={keyword}&origQuery={keyword}').json()
    df = pd.DataFrame(r['shoppingResult']['products'])
    

    You can add a loop to retrieve next pages, etc.