Search code examples
pythonselenium-webdriverweb-scraping

How to extract all the google reviews from google map


I need to scrap all the google reviews. There are 90,564 reviews in my page. However the code i wrote can scrap only top 9 reviews. The other reviews are not scraped.

The code is given below:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# specify the url of the business page on Google
url = 'https://www.google.com/maps/place/ISKCON+temple+Bangalore/@13.0098328,77.5510964,15z/data=!4m7!3m6!1s0x0:0x7a7fb24a41a6b2b3!8m2!3d13.0098328!4d77.5510964!9m1!1b1'

# create an instance of the Chrome driver
driver = webdriver.Chrome()

# navigate to the specified url
driver.get(url)

# Wait for the reviews to load
wait = WebDriverWait(driver, 20) # increased the waiting time
review_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'wiI7pd')))

        
# extract the text of each review
reviews = [element.text for element in review_elements]

# print the reviews
print(reviews)

# close the browser
driver.quit()

what should i edit/modify the code to extract all the reviews?


Solution

  • Here is the working code for you after launching the url

        totalRev = "div div.fontBodySmall"
        username = ".d4r55"
        reviews = "wiI7pd"
    
        wait = WebDriverWait(driver, 20)
    
        totalRevCount = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, totalRev))).get_attribute("textContent").split(' ')[0].replace(',','').replace('.','')
        print("totalRevCount - ", totalRevCount)
    
        wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, totalRev))).click()
    
        mydict = {}
        found = 0
    
        while found < int(totalRevCount):
    
            review_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, reviews)))
            reviewer_names = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, username)))
    
            found = len(mydict)
            for rev, name in zip(review_elements, reviewer_names):
                mydict[name.text] = rev.text
                if len(rev.text) == 0:
                    found = int(totalRevCount) + 1
                    break
    
            for i in range(8):
                ActionChains(driver).key_down(Keys.ARROW_DOWN).perform()
    
            print("found - ", found)
    
            print(mydict)
    
            time.sleep(2)
    

    Explanation -

    1. Get the locators for user name and review since we are going to create a key-value pair which will be useful in creating a non-duplicate result
    2. You need to first get the total number of reviews/ratings that are present for that given location.
    3. Get the username and review for the "visible" part of the webpage and store it in the dictionary
    4. Scroll down the page and wait a few seconds
    5. Get the username and review again and add them to dictionary. Only new ones will be added
    6. As soon as a review that has no text (only rating), the loop will close and you have your results.

    NOTE - If you want all reviews irrespective of the review text present or not, you can remove the "if" loop