Search code examples
amazon-web-servicesamazon-ec2alarm

Creating Alarm in AWS for EC2 Instances


How to create alarm when 1) Alert when EC2 instance runs for too long (Say for 1 hour). 2)Alert when number of EC2 instances reaches a threshold (say 5 instances at a time)

One more assumption is, these EC2 instance are specific.Say these alerts applicable to EC2 instances where their instance name start with "test".

When i try to create the alarm , i haven't see this logic in Metrics. Standard Metrics include CPU Utilization, Network In, Network Out etc.

Is there a way to create this alarm either by defining our custom metrics or some other options?


Solution

  • I recently implemented a solution (see Github repo) to create alarms for EC2 instances based on runtime, but the approach can also be adapted for instance count. Here's how I approached it:

    1. Create an AWS Lambda function that is triggered periodically by an Amazon EventBridge (CloudWatch Events) rule.
    2. In the Lambda function, use the Boto3 library to describe all running EC2 instances and filter them based on specific tags (e.g., instances with names starting with "test").
    3. For each relevant instance, calculate the runtime by comparing the current time with the instance's launch time. If the runtime exceeds a specified threshold (e.g., 1 hour), send an alert using Amazon SNS or another notification mechanism.
    4. To alert when the number of EC2 instances reaches a threshold (e.g., 5 instances), keep a count of the relevant running instances and send an alert if the count exceeds the threshold.

    Here's a simplified version of the Lambda function:

    import boto3
    from datetime import datetime, timezone
    
    def lambda_handler(event, context):
        ec2 = boto3.client('ec2')
        
        # Specify the desired runtime threshold in hours
        runtime_threshold = 1
        
        # Specify the desired instance count threshold
        instance_count_threshold = 5
        
        # Get all running EC2 instances
        instances = ec2.describe_instances(Filters=[
            {'Name': 'instance-state-name', 'Values': ['running']},
            {'Name': 'tag:Name', 'Values': ['test*']}
        ])
        
        instance_count = 0
        for reservation in instances['Reservations']:
            for instance in reservation['Instances']:
                instance_count += 1
                
                # Calculate runtime
                launch_time = instance['LaunchTime']
                current_time = datetime.now(timezone.utc)
                runtime = current_time - launch_time
                runtime_hours = runtime.total_seconds() / 3600
                
                if runtime_hours > runtime_threshold:
                    # Send runtime alert
                    send_alert(f"Instance {instance['InstanceId']} has been running for {runtime_hours:.2f} hours.")
        
        if instance_count > instance_count_threshold:
            # Send instance count alert
            send_alert(f"There are currently {instance_count} running instances.")
    
    def send_alert(message):
        # Implement your alert mechanism here (e.g., SNS, email)
        print(message)
    

    This Lambda function retrieves all running EC2 instances with names starting with "test", calculates their runtime, and sends alerts if the runtime exceeds the specified threshold. It also sends an alert if the total count of relevant instances exceeds the specified threshold.

    Note: Make sure to replace the send_alert function with your desired alert mechanism (e.g., SNS, email).

    I hope this helps!