I am trying to write a Python program to gather data from Google Trends (GT)- specifically, I want to automatically open URLs and access the specific values that are displayed in the title. I have written the code and i am able to scrape data successfully. But i compare the data returned by code and one present in the url, the results are only partially returned. For e.g. in the below image, the code returns the first title "Manchester United F.C. • Tottenham Hotspur F.C." But the actual website has 4 results "Manchester United F.C. • Tottenham Hotspur F.C. , International Champions Cup, Manchester ". google trends image
We have currently tried all all possible locate elements in a page but we are still unable to fund a fix for this. We didn't want to use scrapy or beautiful soup for this
import pandas as pd
import requests
import re
from bs4 import BeautifulSoup
import time
from selenium import webdriver
links=["https://trends.google.com/trends/trendingsearches/realtime?geo=DE&category=s"]
for link in links:
Title_temp=[]
Titile=''
seleniumDriver = r"C:/Users/Downloads/chromedriver_win32/chromedriver.exe"
chrome_options = Options()
brow = webdriver.Chrome(executable_path=seleniumDriver, chrome_options=chrome_options)
try:
brow.get(link) ## getting the url
try:
content = brow.find_elements_by_class_name("details-top")
for element in content:
Title_temp.append(element.text)
Title=' '.join(Title_temp)
except:
Title=''
brow.quit()
except Exception as error:
print error
break
Final_df = pd.DataFrame(
{'Title': Title_temp
})
From what I see, data is retrieved from an API endpoint you can call direct. I show how to call and then extract only the title (note more info is returned other than just title from API call). You can explore the breadth of what is returned (which includes article snippets, urls, image links etc) here.
import requests
import json
r = requests.get('https://trends.google.com/trends/api/realtimetrends?hl=en-GB&tz=-60&cat=s&fi=0&fs=0&geo=DE&ri=300&rs=20&sort=0')
data = json.loads(r.text[5:])
titles = [story['title'] for story in data['storySummaries']['trendingStories']]
print(titles)