Search code examples
pythongoogle-apigoogle-analytics-apigoogle-api-python-client

How do I create a GOOGLE API script to pull data by the hour?


Essentially what I want to do is run this script every hour to pull in data from the last hour only and then the script will run again an hour later. I want this script to pull all data associated with that last hour and then onwards for every hour of the day. How would I do this as I am only seeing a filter that can do this but I have read that it will only pull a sample and then filter on the hour from that sample.

   def get_report(analytics):

       return analytics.reports().batchGet(
          body={
              'reportRequests': [
               {
               '    viewId': VIEW_ID,
                       'dateRanges': [{'startDate': 
                                      '1dayAgo','endDate':'today'}],
     'metrics': [{'expression': 'ga:uniquepageviews'},
                    {'expression': 'ga:timeonpage'},
                    {'expression': 'ga:entrances'},
                    {'expression': 'ga:exits'},
                    {"expression": "ga:pageviews"}
                    ],
      'dimensions': [{'name': 'ga:dimension97'},
                    {'name': 'ga:dimension69'},
                    {'name': 'ga:dateHourMinute'},
                    ]

                   }]
            }
        ).execute()

Solution

  • Python has a sched module. It is possible to save the following code into a file and then execute it.

    There are options for keeping the script running: terminal window, tmux session, background process, etc.

    I used to use cron a lot but have changed to using the Python sched module. It can be easier to troubleshoot.

    Save this code into a file. execute chmod 755 <myfile.py> Then run the script: ./myfile.py

    #!/usr/bin/env python
    
    import sched
    import time
    from datetime import datetime, timedelta
    
    # Create a scheduler instance.
    scheduler = sched.scheduler(timefunc=time.time)
    
    def reschedule(interval: dict=None):
        """Define how often the action function will run.
        Pass a dict interval {'hours': 1} to make it run every hour.
        """
        interval = {'minutes': 1} if interval is None else interval
        # Get the current time and remove the seconds and microseconds.
        now = datetime.now().replace(second=0, microsecond=0)
        # Add the time interval to now
        target = now + timedelta(**interval)
        # Schedule the task
        scheduler.enterabs(target.timestamp(), priority=0, action=get_report)
    
    def get_report(analytics=None):
        # replace the print call with the code execute the Google API call
        print(time.ctime())
    
        reschedule() # Reschedule so it runs again.
    
    if __name__ == "__main__":
        reschedule() # start
    
        try:
            scheduler.run(blocking=True)
        except KeyboardInterrupt:
            print('Stopped.')
    

    OUTPUT:

    Tue Oct 29 22:35:00 2019
    Tue Oct 29 22:36:00 2019
    Stopped.