logging flask amazon-ecs datadog aws-fargate

Collecting Python logs to DataDog in ECS

I am having a hard time to collect logs from an python app deployed in ECS using DataDog Agent. I have a dockerized Flask app deployed in ECS. The app spits logs to stdout. I now need to monitor them in DataDog.

I've added a new DataDog agent container (Fargate compatible, since I am using Fargate), which runs as part of the same task as the app. I can see the CPU and memory metrics for both container in app.datadoghq.com/containers, so that means that DataDog Agent is working.

I now need the app logs. I went through the documentation in https://app.datadoghq.com/logs/onboarding/container, added

  "dockerLabels": {
    "com.datadoghq.ad.logs": "[{\"source\": \"python\", \"service\": \"flask\"}]"
  },

to the app container and the following env.vars to the DataDog container :

  "environment": [
    {
      "name": "DD_API_KEY",
      "value": "<key>"
    },
    {
      "name": "DD_LOGS_ENABLED",
      "value": "true"
    },
    {
      "name": "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL",
      "value": "true"
    },
    {
      "name": "SD_BACKEND",
      "value": "docker"
    },
    {
      "name": "ECS_FARGATE",
      "value": "true"
    }
  ]

But that seems to be insufficient. Am I going in the right direction ? What am I missing ?

Solution

I discussed this with Datadog support, and they confirmed that the awslogs logging driver prevents the Datadog agent container from accessing a container's logs. Since awslogs is currently the only logging driver available to tasks using the Fargate launch type, getting logs into Datadog will require another method.

Since the awslogs logging driver emits logs to CloudWatch, one method that I have used is to create a subscription to stream those log groups to Datadog's Lambda function as configured here. You can do that from the Lambda side using CloudWatch logs as the trigger, or from the CloudWatch Logs side, by clicking Actions> Stream to AWS Lambda.

I chose the Lambda option because it was quick and easy and required no code changes to our applications (since we are still in the evaluation stage). Datadog support advised me that it was necessary to modify the Lambda function in order to attribute logs to the corresponding service:

In this block, modify it to something like:

structured_line = merge_dicts(log, {
    "syslog.hostname": logs["logStream"],
    "syslog.path": logs["logGroup"],
    "syslog.appname": logs["logGroup"],
    "aws": {
        "awslogs": {
            "logGroup": logs["logGroup"],
            "logStream": logs["logStream"],
            "owner": logs["owner"]
        }
    }
})

According to Datadog support:

syslog.appname needs to match an existing APM service in order to correlate logs to the service.
This solution is not fully supported at the moment and they are working on documenting this more thoroughly.

I had to make further modifications to set the value of the syslog.* keys in a way that made sense for our applications, but it works great.