Search code examples
pythonamazon-web-servicesboto3amazon-cloudfront

Boto3 CloudFront Object Usage Count


Looking to count the number of times all of the objects in my CloudFront dist has been hit individually so that I can generate an excel sheet to track usage stats. I've been looking through the boto3 docs for CloudFront and I haven't been able to peg down where that information could be accessed. I see that AWS Cloudfront console generates a 'Popular Objects' report. I wasn't sure if anyone knew how to get the numbers that AWS generates for that report in boto3?

If it's not accessible through Boto3, would there be an AWS CLI command that I should use instead?

UPDATE:

Here's what I ended up using as pseudo-code, hopefully it's a starting point for someone else:

import boto3
import gzip
from datetime import datetime, date, timedelta
import shutil
from xlwt import Workbook

def analyze(timeInterval):
    """
    analyze usage data in cloudfront
    :param domain:
    :param id:
    :param password:
    :return: usage data
    """
    outputList = []
    outputDict = {}

    s3 = boto3.resource('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=PASSWORD)
    data = s3.Bucket(AWS_STORAGE_BUCKET_NAME)
    count = 0
    currentDatetime = str(datetime.now()).split(' ')
    currentDatetime = currentDatetime[0].split('-')
    currentdatetimeYear = int(currentDatetime[0])
    currentdatetimeMonth = int(currentDatetime[1])
    currentdatetimeDay = int(currentDatetime[2])
    currentDatetime = date(year=currentdatetimeYear, month=currentdatetimeMonth, day=currentdatetimeDay)

    # create excel workbook/sheet that we'll save results to
    wb = Workbook()
    sheet1 = wb.add_sheet('Log Results By URL')
    sheet1.write(0, 1, 'File')
    sheet1.write(0, 2, 'Total Hit Count')
    sheet1.write(0, 3, 'Total Byte Count')

    for item in data.objects.all():
        count += 1
        # print(count, '\n', item)
        # print(item.key)
        datetimeRef = str(item.key).replace(CLOUDFRONT_IDENTIFIER+'.', '')
        datetimeRef = datetimeRef.split('.')
        datetimeRef = datetimeRef[0]
        datetimeRef = str(datetimeRef[:-3]).split('-')
        datetimeRefYear = int(datetimeRef[0])
        datetimeRefMonth = int(datetimeRef[1])
        datetimeRefDay = int(datetimeRef[2])
        datetimeRef = date(year=datetimeRefYear, month=datetimeRefMonth, day=datetimeRefDay)
        # print('comparing', datetimeRef - timedelta(days=1), currentDatetime)
        if timeInterval == 'daily':
            if datetimeRef > currentDatetime - timedelta(days=1):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'weekly':
            if datetimeRef > currentDatetime - timedelta(days=7):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'monthly':
            if datetimeRef > currentDatetime - timedelta(weeks=4):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        elif timeInterval == 'yearly':
            if datetimeRef > currentDatetime - timedelta(weeks=52):
                pass
            else:
                # file not within datetime restrictions, don't do stuff
                continue
        print('datetimeRef', datetimeRef)
        print('currentDatetime', currentDatetime)
        print('Analyzing File:', item.key)

        # download the file
        s3.Bucket(AWS_STORAGE_BUCKET_NAME).download_file(item.key, 'logFile.gz')

        # unzip the file
        with gzip.open('logFile.gz', 'rb') as f_in:
            with open('logFile.txt', 'wb') as f_out:
                shutil.copyfileobj(f_in, f_out)

        # read the text file and add contents to a list
        with open('logFile.txt', 'r') as f:
            lines = f.readlines()
            localcount = -1
            for line in lines:
                localcount += 1
                if localcount < 2:
                    continue
                else:
                    outputList.append(line)

        # print(outputList)
        # iterate through the data collecting hit counts and byte size
        for dataline in outputList:
            data = dataline.split('\t')
            # print(data)
            if outputDict.get(data[7]) is None:
                outputDict[data[7]] = {'count': 1, 'byteCount': int(data[3])}
            else:
                td = outputDict[data[7]]
                outputDict[data[7]] = {'count': int(td['count']) + 1, 'byteCount': int(td['byteCount']) + int(data[3])}

    # print(outputDict)
    #  iterate through the result dictionary and write to the excel sheet
    outputDictKeys = outputDict.keys()
    count = 1
    for outputDictKey in outputDictKeys:
        sheet1.write(count, 1, str(outputDictKey))
        sheet1.write(count, 2, outputDict[outputDictKey]['count'])
        sheet1.write(count, 3, outputDict[outputDictKey]['byteCount'])
        count += 1
    safeDateTime = str(datetime.now()).replace(':', '.')

    # save the workbook
    wb.save(str(timeInterval)+str('_Log_Result_'+str(safeDateTime)) + '.xls')


if __name__ == '__main__':
    analyze('daily')

Solution

  • From Configuring and Using Standard Logs (Access Logs) - Amazon CloudFront:

    You can configure CloudFront to create log files that contain detailed information about every user request that CloudFront receives. These are called standard logs, also known as access logs. These standard logs are available for both web and RTMP distributions. If you enable standard logs, you can also specify the Amazon S3 bucket that you want CloudFront to save files in.

    The log files can be quite large, but you can Query Amazon CloudFront Logs using Amazon Athena.