I need to crawl data from the website that data are loaded by scroll down. The website returned 5 data before scrolling down, and expected 80 data returned after scrolling down are done. I'm using the requests_html module and tried this
from requests_html import HTML, HTMLSession
keyword = '유산균'
n = 1
url = f'https://search.shopping.naver.com/search/all?frm=NVSHATC&origQuery={keyword}&pagingIndex={n}&pagingSize=80&productSet=total&query={keyword}&sort=rel×tamp=&viewType=list'
session = HTMLSession()
ses = session.get(url)
html = HTML(html=ses.text)
item_list = html.find('div.basicList_title__3P9Q7')
print(len(item_list))
ses.html.render(scrolldown=100, sleep=.1)
'''
ses.html.render(script="window.scrollTo(0, 99999)", sleep= 10)
also tried not worked either
'''
print(len(item_list))
I expected 5, 80 as the result but both print returned the same result. 5 and 5.
what is wrong with my code?
When you monitor the network activity when loading the site, you'll see that it loads the search results from an api. This means that you can retrieve the data directly from the api without scraping. Here is an example that loads the first page as a pandas dataframe:
import requests
import pandas as pd
keyword = '유산균'
n = 1
r = requests.get(f'https://search.shopping.naver.com/api/search/all?sort=rel&pagingIndex={n}&pagingSize=80&viewType=list&productSet=total&deliveryFee=&deliveryTypeValue=&frm=NVSHATC&query={keyword}&origQuery={keyword}').json()
df = pd.DataFrame(r['shoppingResult']['products'])
You can add a loop to retrieve next pages, etc.