Search code examples
pythonseleniumscreen-scraping

Scraping with Selenium not showing all data (possible duplicate)


I was trying to make a simple code for scraping a dynamic website (a newbie with Selenium here). The data I intended to scrape is the product name and the price. I ran over the code and it worked, but only showed 10 entries, while there are 60 entries for each page. Here is the code:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.tokopedia.com/p/komputer-laptop/media-penyimpanan-data') # the link

product_name = driver.find_elements(By.CSS_SELECTOR, value='span.css-1bjwylw')
product_price = driver.find_elements(By.CSS_SELECTOR, value='span.css-o5uqvq')

list_product = []
list_price = []

for i in range(len(product_name)):
    list_product.append(product_name[i].text)

for j in range(len(product_price)):
    list_price.append(product_price[i].text)

driver.quit()

df = pd.DataFrame(columns=['product', 'price'])
df['product'] = list_product
df['price'] = list_price
print(df)

I used the chromedriver installer instead of downloading the driver first and then locating it because I just thought it was just a simpler way. Also, I used Service instead of Options (many tutorial using Options) because I got some errors with it, and with Service it worked out fine. Oh, and I used PyCharm, if that just makes sense of something, maybe.

Any help or suggestions will be very much appreciated, thank you!


Solution

  • According to me you need to scroll down to bottom of the page first for all 60 of data to be loaded. As website is dynamic and as you scroll below data gets loaded. You can use javascript script for scrolling via webdriver as follows: driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") add this below driver.get() and before find_elements().

    Don't forget to use sleep after scroll as it require time to get loaded.