Search code examples
pythonamazon-web-servicesaws-lambdaboto3amazon-cloudwatch

How to query cloudwatch logs using boto3 in python


I have a lambda function that writes metrics to Cloudwatch. While, it writes metrics, It generates some logs in a log-group.

INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local

INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local

INFO:: username: [email protected] ClinicID: 7668 nodename: MacBook-Pro-2.local

INFO:: username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local

I would like to query AWS logs in past x hours where x could be anywhere between 12 to 24 hours, based on any of the params.

For ex:

  1. Query Cloudwatch logs in last 5 hours where ClinicID=7667

or

  1. Query Cloudwatch logs in last 5 hours where ClinicID=7667 and username='[email protected]'

or

  1. Query Cloudwatch logs in last 5 hours where username='[email protected]'

I am using boto3 in Python.


Solution

  • You can get what you want using CloudWatch Logs Insights.

    You would use start_query and get_query_results APIs: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html

    To start a query you would use (for use case 2 from your question, 1 and 3 are similar):

    import boto3
    from datetime import datetime, timedelta
    import time
    
    client = boto3.client('logs')
    
    query = "fields @timestamp, @message | parse @message \"username: * ClinicID: * nodename: *\" as username, ClinicID, nodename | filter ClinicID = 7667 and username='[email protected]'"
    
    log_group = '/aws/lambda/NAME_OF_YOUR_LAMBDA_FUNCTION'
    
    start_query_response = client.start_query(
        logGroupName=log_group,
        startTime=int((datetime.today() - timedelta(hours=5)).timestamp()),
        endTime=int(datetime.now().timestamp()),
        queryString=query,
    )
    
    query_id = start_query_response['queryId']
    
    response = None
    
    while response == None or response['status'] == 'Running':
        print('Waiting for query to complete ...')
        time.sleep(1)
        response = client.get_query_results(
            queryId=query_id
        )
    

    Response will contain your data in this format (plus some metadata):

    {
      'results': [
        [
          {
            'field': '@timestamp',
            'value': '2019-12-09 17:07:24.428'
          },
          {
            'field': '@message',
            'value': 'username: [email protected] ClinicID: 7667 nodename: MacBook-Pro-2.local\n'
          },
          {
            'field': 'username',
            'value': '[email protected]'
          },
          {
            'field': 'ClinicID',
            'value': '7667'
          },
          {
            'field': 'nodename',
            'value': 'MacBook-Pro-2.local\n'
          }
        ]
      ]
    }