Search code examples
pythonpandascsvweb-crawlergoogle-crawlers

Google Trend Crawler: CSV writing issues


Below code is Google Trend Crawler that uses unofficical API from "https://github.com/GeneralMills/pytrends". My code runs fine but one problem is that no one knows the limit to Google Trend Crawler. So if I run my Crawler with List of 2000 or more "DNA", then I have error saying I have exceeded the request limit. If I have gone over the limit, all of my crawled data before the limit will be lost since I am writing to csv at the end of the code. Is there a way to write my data to csv for every loop, so even though I pass the limit, at least I have the data before the limit was reached? Thanks

from pytrends.request import TrendReq
from datetime import datetime
import pandas as pd
import time
import xlsxwriter

pytrends = TrendReq(hl='en-US,tz=360')
Data = pd.DataFrame()

#for loop check writer path
path = "C:/Users/aijhshin/Workk/GoogleTrendCounter.txt"
#file = open(path,"a") 

#setting index using 'apple' keyword 
kw_list = ['apple']
pytrends.build_payload(kw_list, cat=0, timeframe='today 5-y', geo='', gprop='')
Googledate = pd.DataFrame(pytrends.interest_over_time())
Data['Date'] = Googledate.index

#Google Trend Crawler limit = 1600 request per day
for i in range(len(DNA)):
    kw_list = [DNA[i]]
    pytrends.build_payload(kw_list, cat=0, timeframe='today 5-y', geo='', gprop='')

    #results
    df = pd.DataFrame(pytrends.interest_over_time())
    if(df.empty == True):         
        Data[DNA[i]] = ""  
    else:                         
        df.index.name = 'Date'
        df.reset_index(inplace=True)
        Data[DNA[i]] = df.loc[:, DNA[i]]

    #test for loop process 
    file = open(path,"a")
    file.write(str(i) + " " + str(datetime.now()) + " ")
    file.write(DNA[i] +'\n')
    file.close()

    #run one per nine second (optional)
    #time.sleep(9)

    #writing csv file (overwrite each time)
    Data.to_csv('Google Trend.csv')

print("Crawling Done")

Solution

  • Move Data.to_csv('Google Trend.csv') after time.sleep(9) and change it's mode to a

    time.sleep(9)
    Data.to_csv('Google Trend.csv', mode='a')
    

    The mode a will append to the end of your csv file rather than overwriting it.