amazon-web-services amazon-elastic-beanstalk amazon-cloudwatch

How to show the percentage of uptime of an AWS service on the dashboard of CloudWatch?

I want to build a dashboard that displays the percentage of the uptime for each month of an Elastic Beanstalk service in my company.

So I used boto3 get_metric_data to retrieve the Environment Health CloudWatch metrics data and calculate the percentage of non-severe time of my service.

from datetime import datetime
import boto3

SEVERE = 25

client = boto3.client('cloudwatch')

metric_data_queries = [
    {
        'Id': 'healthStatus', 
        'MetricStat': {
            'Metric': {
                'Namespace': 'AWS/ElasticBeanstalk',
                'MetricName': 'EnvironmentHealth',
                'Dimensions': [
                    {
                        'Name': 'EnvironmentName', 
                        'Value': 'ServiceA'
                    }
                ]
            },
            'Period': 300,
            'Stat': 'Maximum'
        },
        'Label': 'EnvironmentHealth',
        'ReturnData': True
    }
]

response = client.get_metric_data(
    MetricDataQueries=metric_data_queries,
    StartTime=datetime(2019, 9, 1),
    EndTime=datetime(2019, 9, 30),
    ScanBy='TimestampAscending'
    )

health_data = response['MetricDataResults'][0]['Values']
total_times = len(health_data)
severe_times = health_data.count(SEVERE)
print(f'total_times: {total_times}')
print(f'severe_times: {severe_times}')
print(f'healthy percent: {1 - (severe_times/total_times)}')

Now I'm wondering how to show the percentage on the dashboard on CloudWatch. I mean I want to show something like the following:

Does anyone know how to upload the healthy percent I've calculated to the dashboard of CloudWatch?

Or is there any other tool that is more appropriate for displaying the uptime of my service?

Solution

You can do math with CloudWatch metrics: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html

You can create a metric math expression with the metrics you have in metric_data_queries and get the result on the graph. Metric math also works with GetMetricData API, so you could move the calculation you do into MetricDataQuery and get the number you need directly from CloudWatch.

Looks like you need a number saying what percentage of datapoints in the last month the metric value equaled to 25.

You can calculate it like this (this is the source of the graph, you can use in CloudWatch console on the source tab, make sure the region matches your region and the metric name matches your metric):

{
  "metrics": [
    [
      "AWS/ElasticBeanstalk",
      "EnvironmentHealth",
      "EnvironmentName",
      "ServiceA",
      {
        "label": "metric",
        "id": "m1",
        "visible": false,
        "stat": "Maximum"
      }
    ],
    [
      {
        "expression": "25",
        "label": "Value for severe",
        "id": "severe_c",
        "visible": false
      }
    ],
    [
      {
        "expression": "m1*0",
        "label": "Constant 0 time series",
        "id": "zero_ts",
        "visible": false
      }
    ],
    [
      {
        "expression": "1-AVG(CEIL(ABS(m1-severe_c)/MAX(m1)))",
        "label": "Percentage of times value equals severe",
        "id": "severe_pct",
        "visible": false
      }
    ],
    [
      {
        "expression": "(zero_ts+severe_pct)*100",
        "label": "Service Uptime",
        "id": "e1"
      }
    ]
  ],
  "view": "singleValue",
  "stacked": false,
  "region": "eu-west-1",
  "period": 300
}

To explain what is going on there (what is the purpose of each element above, by id):

m1 - This is your original metric. Setting stat to Maximum.
severe_c - Constant you want to use for your SEVERE value.
zero_ts - Creating a constant time series with all values equal zero. This is needed because constants can't be graphed and the final value will be constant. So to graph it, we'll just add the constant to this time series of zeros.
severe_pct - this is where you actually calculate the percentage of value that are equal SEVERE.
- m1-severe_c - sets the datapoints with value equal SEVERE to 0.
- ABS(m1-severe_c) - makes all values positive, keeps SEVERE datapoints at 0.
- ABS(m1-severe_c)/MAX(m1) - dividing by maximum value ensures that all values are now between 0 and 1.
- CEIL(ABS(m1-severe_c)/MAX(m1)) - snaps all values that are different than 0 to 1, keeps SEVERE at 0.
- AVG(CEIL(ABS(m1-severe_c)/MAX(m1)) - Because metric is now all 1s and 0s, with 0 meaning SEVERE, taking the average gives you the percentage of non severe datapoints.
- 1-AVG(CEIL(ABS(m1-severe_c)/MAX(m1))) - finally you need the percentage of severe values and since values are either severe or not sever, substracting from 1 gives you the needed number.
e1 - The last expression gave you a constant between 0 and 1. You need a time series between 0 and 100. This is the expression that gives you that: (zero_ts+severe_pct)*100. Not that this is the only result that you're returning, all other expressions have "visible": false.