Search code examples
amazon-web-servicesterraformamazon-ecsamazon-cloudwatch

Insufficient data for cloudwatch alarm to detect running task count less than desired task count


I have the following terraform to configure a cloudwatch alert, which should trigger if the desired count of an ecs task is greater than the running count:

resource "aws_cloudwatch_metric_alarm" "ecs_task_count_alarm" {
  for_each            = toset(var.ecs_services) # Converts list to a set to iterate over unique service names
  alarm_name          = "ECS_Tasks_Running_Check_${each.key}"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  threshold           = 0
  alarm_description   = "Alarm if DesiredTaskCount is greater than the RunningTaskCount for the ECS service ${each.key}."

  # Metric query for RunningTaskCount
  metric_query {
    id = "running_task_count"
    metric {
      metric_name = "RunningTaskCount"
      namespace   = "AWS/ECS"
      period      = 60 # 1 minute
      stat        = "Average"
      dimensions = {
        ClusterName = var.ecs_cluster_name
        ServiceName = each.key
      }
    }
  }

  # Metric query for DesiredTaskCount
  metric_query {
    id = "desired_task_count"
    metric {
      metric_name = "DesiredTaskCount"
      namespace   = "AWS/ECS"
      period      = 60 # 1 minute
      stat        = "Average"
      dimensions = {
        ClusterName = var.ecs_cluster_name
        ServiceName = each.key
      }
    }
  }

  # Math expression to check if RunningTaskCount is less than DesiredTaskCount
  metric_query {
    id          = "number_of_tasks_we_want_but_arent_getting"
    expression  = "desired_task_count - running_task_count"
    label       = "Difference between Desired and Running Task Counts"
    return_data = true
  }

  alarm_actions             = [aws_sns_topic.sns_alarms.arn]
  insufficient_data_actions = [aws_sns_topic.sns_alarms.arn]
  ok_actions                = [aws_sns_topic.sns_alarms.arn]
}

However the alert is reporting insufficient data.

ecs_services is a list of the service names ["foo", "bar"] ecs_cluster_name is the name of the cluster this is running in

What am I missing?


Solution

  • Reporting insufficient data means that either that's a new metric, or the namespace or the metric name is wrong.

    It looks like the issue is related to where you’re looking for the metrics. Let me explain:

    The AWS/ECS namespace only includes two basic default metrics: MemoryUtilization and CPUUtilization. If you want to track task-level metrics like DesiredTaskCount and RunningTaskCount, you’ll need to check them under the ECS/ContainerInsights namespace. This is where all the task-level metrics are available.

    A quick tip: always manually verify the correct namespace and metric names in the CloudWatch console before writing your code. This will save you a lot of time and help avoid issues like this.

    Updated Code

    Here’s how your Terraform configuration should look after making the necessary changes:

     # Metric query for RunningTaskCount
          metric_query {
            id = "running_task_count"
            metric {
              metric_name = "RunningTaskCount"
              namespace   = "ECS/ContainerInsights"
              period      = 60 # 1 minute
              stat        = "Average"
              dimensions = {
                ClusterName = var.ecs_cluster_name
                ServiceName = each.key
              }
            }
          }
        
          # Metric query for DesiredTaskCount
          metric_query {
            id = "desired_task_count"
            metric {
              metric_name = "DesiredTaskCount"
              namespace   = "ECS/ContainerInsights"
              period      = 60 # 1 minute
              stat        = "Average"
              dimensions = {
                ClusterName = var.ecs_cluster_name
                ServiceName = each.key
              }
            }
          }