Search code examples
pythonboto3amazon-cloudwatchamazon-cloudwatchlogsaws-cloudwatch-log-insights

query cloudwatch logs for distinct values using boto3 in python


I have a lambda function that writes metrics to Cloudwatch. While, it writes metrics, It generates some logs in a log-group.

INFO:: username: simran+test@abc.com ClinicID: 7667 nodename: MacBook-Pro-2.local

INFO:: username: simran+test2@abc.com ClinicID: 7669 nodename: MacBook-Pro-3.local

INFO:: username: simran+test@abc.com ClinicID: 7668 nodename: MacBook-Pro-4.local

INFO:: username: simran+test3@abc.com ClinicID: 7667 nodename: MacBook-Pro-5.local

INFO:: username: simran+test3@abc.com ClinicID: 7667 nodename: MacBook-Pro-2.local

I need an efficient way to get distinct values of nodename for a given ClinicId. For example, I pass in 7667 for ClinicId and I expect

['MacBook-Pro-2.local', 'MacBook-Pro-5.local']

This is what I tried:

 query = "fields @timestamp, @message | parse @message \"username: * ClinicID: * nodename: *\" as username, ClinicID, nodename | filter ClinicID = "+ clinic_id

 start_query_response = client.start_query(
        logGroupName=log_group,
        startTime=int(time.mktime((Util.utcnow() - timedelta(hours=hours)).timetuple())),
        endTime=int(time.mktime(Util.utcnow().timetuple())),
        queryString=query,
    )

I considered iterating start_query_response in Python but I do not like that idea. Since it is logs for over 7 days that I will be looking at, I need an efficient way instead of having to iterate each log from past 7 days for the given ClinicID.


Solution

  • You can pipe you expression to the stat command and count occurrences of each nodename.

    Add this to the end of your query:

    | stats count(*) by nodename
    

    Result will be:

    {
      'results': [
        [
          {
            'field': 'nodename',
            'value': 'MacBook-Pro-2.local\n'
          },
          {
            'field': 'count(*)',
            'value': '2'
          }
        ],
        [
          {
            'field': 'nodename',
            'value': 'MacBook-Pro-5.local\n'
          },
          {
            'field': 'count(*)',
            'value': '1'
          }
        ]
      ]
    }
    

    See here for more details on various commands: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html