Search code examples
pythonseleniumselenium-webdrivergoogle-trends

Retrieve all elements in Google Trends Data using selenium Python


I am trying to write a Python program to gather data from Google Trends (GT)- specifically, I want to automatically open URLs and access the specific values that are displayed in the title. I have written the code and i am able to scrape data successfully. But i compare the data returned by code and one present in the url, the results are only partially returned. For e.g. in the below image, the code returns the first title "Manchester United F.C. • Tottenham Hotspur F.C." But the actual website has 4 results "Manchester United F.C. • Tottenham Hotspur F.C. , International Champions Cup, Manchester ". google trends image

screenshot output of code

We have currently tried all all possible locate elements in a page but we are still unable to fund a fix for this. We didn't want to use scrapy or beautiful soup for this

    import pandas as pd
    import requests
    import re
    from bs4 import BeautifulSoup
    import time
    from selenium import webdriver

    links=["https://trends.google.com/trends/trendingsearches/realtime?geo=DE&category=s"] 

    for link in links:
        Title_temp=[]
        Titile=''
        seleniumDriver = r"C:/Users/Downloads/chromedriver_win32/chromedriver.exe" 
        chrome_options = Options()
        brow = webdriver.Chrome(executable_path=seleniumDriver, chrome_options=chrome_options)
        try:
            brow.get(link) ## getting the url
            try:
                content = brow.find_elements_by_class_name("details-top")
                for element in content:
                    Title_temp.append(element.text)    
                Title=' '.join(Title_temp)
            except:
                Title=''       
            brow.quit()

        except Exception as error:
            print error
            break

    Final_df = pd.DataFrame(
        {'Title': Title_temp
        })

Solution

  • From what I see, data is retrieved from an API endpoint you can call direct. I show how to call and then extract only the title (note more info is returned other than just title from API call). You can explore the breadth of what is returned (which includes article snippets, urls, image links etc) here.

    import requests
    import json
    
    r = requests.get('https://trends.google.com/trends/api/realtimetrends?hl=en-GB&tz=-60&cat=s&fi=0&fs=0&geo=DE&ri=300&rs=20&sort=0')
    data = json.loads(r.text[5:])
    titles = [story['title'] for story in data['storySummaries']['trendingStories']]
    print(titles)