So I'm fairly new to coding and I am supposed to be parsing Yelp reviews so I can analyze the data using Pandas. I have been trying to use selenium/beautifulsoup to automate the whole process, but I can't get past the webdriver/chromedriver errors in each version of the code I make.
!pip install selenium
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import os
# Set the path to the ChromeDriver executable
chromedriver_path = "C:\\Users\\5mxz2\\Downloads\\chromedriver\\chromedriver"
# Set the URL of the Yelp page you want to scrape
url = "https://www.yelp.com/biz/gelati-celesti-virginia-beach-2"
# Set the options for Chrome
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless") # Run Chrome in headless mode, comment this line if you want to see the browser window
# Create the ChromeDriver instance
driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_options)
# Load the Yelp page
driver.get(url)
# Extract the page source and pass it to BeautifulSoup
soup = BeautifulSoup(driver.page_source, "html.parser")
# Find all review elements on the page
reviews = soup.find_all("div", class_="review")
# Create empty lists to store the extracted data
review_texts = []
ratings = []
dates = []
# Iterate over each review element
for review in reviews:
# Extract the review text
review_text = review.find("p", class_="comment").get_text()
review_texts.append(review_text.strip())
# Extract the rating
rating = review.find("div", class_="rating").get("aria-label")
ratings.append(rating)
# Extract the date
date = review.find("span", class_="rating-qualifier").get_text()
dates.append(date.strip())
# Create a DataFrame from the extracted data
data = {
"Review Text": review_texts,
"Rating": ratings,
"Date": dates
}
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
# Get the current working directory
path = os.getcwd()
# Save the DataFrame as a CSV file
csv_path = os.path.join(path, "yelp_reviews.csv")
df.to_csv(csv_path, index=False)
# Close the ChromeDriver instance
driver.quit()
That's what I have so far but I keep getting this error message
TypeError Traceback (most recent call last)
<ipython-input-4-5712027ca0bf> in <cell line: 18>()
16
17 # Create the ChromeDriver instance
---> 18 driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_options)
19
20 # Load the Yelp page
TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'
Can someone help me fix this please? And if anyone has any advice regarding the task as a whole, please let me know.
This is due to changes in selenium
4.10.0
:
https://github.com/SeleniumHQ/selenium/commit/9f5801c82fb3be3d5850707c46c3f8176e3ccd8e
Note that executable_path
was removed.
If you want to pass in an executable_path
, you'll have to use the service
arg now.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service(executable_path='./chromedriver.exe')
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=service, options=options)
# ...
driver.quit()