Search code examples
vectorterraformamazon-ecsdatadogvectordotdev

ECS cluster with Vector agent running to ingest API data and S3 object


I'm currently building a data pipeline that ingests data with the usage of a vector agent running in an ecs cluster that's being managed by Terraform. Also, the docker image read its' configuration file from a s3 bucket.

So, right now the vector agent runs perfectly in the ecs service that's pulling the data from the SNS topic into the s3 bucket.

Here's the terraform code for it:

resource "aws_ecs_task_definition" "s3_task_def" {
  family                = "vector-s3-task"
  network_mode          = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                   = "256"
  memory                = "512"
  execution_role_arn    = aws_iam_role.logging_execution_role.arn
  task_role_arn         = aws_iam_role.logging_role.arn

  container_definitions = jsonencode([
    {
      "name":      "infosec-vector-container",
      "image":     "${aws_ecr_repository.repository.repository_url}:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8686,
          "hostPort":      8686
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group":        aws_cloudwatch_log_group.log_group.name,
          "awslogs-region":       "us-west-2",
          "awslogs-stream-prefix": aws_cloudwatch_log_stream.log_stream.name
        }
      },
      "environment": [
        {
          "name":  "VECTOR_FILE",
          "value": var.vector
        }
      ]
    }
  ])
}


# creates an ECS service within the ECS cluster for s3 bucket
resource "aws_ecs_service" "service" {
  name            = "vector-s3-service"
  cluster         = aws_ecs_cluster.cluster.id
  task_definition = aws_ecs_task_definition.s3_task_def.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    security_groups = [aws_security_group.sg.id]
    subnets         = [aws_subnet.private_subnet.id]
  }
}

However, when I add the ecs service that ingests data via API, I start to receive weird logs into the ecs logs and the logs that are being pulled from the API are not being sent to the s3 bucket.

Here's the API ecs service terraform code:

resource "aws_ecs_task_definition" "api_task_def" {
  family                = "vector-api-task"
  network_mode          = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                   = "512"
  memory                = "1024"
  execution_role_arn    = aws_iam_role.logging_execution_role.arn
  task_role_arn         = aws_iam_role.logging_role.arn

  container_definitions = jsonencode([
    {
      "name":      "infosec-vector-container",
      "image":     "${aws_ecr_repository.repository.repository_url}:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8686,
          "hostPort":      8686
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group":        aws_cloudwatch_log_group.log_group.name,
          "awslogs-region":       "us-west-2",
          "awslogs-stream-prefix": aws_cloudwatch_log_stream.log_stream.name
        }
      },
      "environment": [
        {
          "name":  "VECTOR_FILE",
          "value": var.vector_api
        },
        {
          "name":  "SLACK_KEY",
          "value": data.aws_secretsmanager_secret_version.slack_secret.secret_string
        }
      ]
    }
  ])
}


# creates an ECS service within the ECS cluster for s3api polling
resource "aws_ecs_service" "api_service" {
  name            = "vector-api-service"
  cluster         = aws_ecs_cluster.cluster.id
  task_definition = aws_ecs_task_definition.api_task_def.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  network_configuration {
    security_groups = [aws_security_group.sg.id]
    subnets         = [aws_subnet.private_subnet.id]
  }
}

Also here's my vector.toml file for the api polling config

###################################################################################################################
### SLACK AUDIT ###
###################################################################################################################

[sources.slack_audit]
type = "http_client"
endpoint = "https://api.slack.com/audit/v1/logs"
method = "GET"
scrape_interval_secs = 900
auth.strategy = "bearer"
auth.token = "${SLACK_KEY}"

    [sources.slack_audit.headers]
    Accept = ["application/json"]
   


[transforms.slack_audit_output]
type = "remap"
inputs = ["slack_audit"]
source = '''
    .source = "slack_audit"
    .vtime = now()
    .data = parse_json!(.message)
    del(.message)
'''

###################################################################################################################
### SLACK DATA SOURCE ###
###################################################################################################################
[sinks.infosec_log_prod]
type = "aws_s3"
inputs = ["*_output"]
bucket = "EXAMPLE"
key_prefix = "application={{ source }}/env=prod/year=%Y/month=%m/day=%d/"
region = "us-west-2"
compression = "gzip"
filename_extension = "json"
encoding.codec = "json"
encoding.timestamp_format = "rfc3339"

These are the logs that I'm receiving in both ecs service/task logs and I'm not understanding why.

{"appname":"shaneIxD","facility":"local2","hostname":"random.org","message":"A bug was encountered but not in Vector, which doesn't have bugs","msgid":"ID258","procid":240,"severity":"warning","timestamp":"2023-07-20T21:39:06.581Z","version":1}
{"appname":"meln1ks","facility":"syslog","hostname":"make.com","message":"We're gonna need a bigger boat","msgid":"ID452","procid":4132,"severity":"info","timestamp":"2023-07-20T21:39:07.580Z","version":1}
{"appname":"devankoshal","facility":"local5","hostname":"we.de","message":"Great Scott! We're never gonna reach 88 mph with the flux capacitor in its current state!","msgid":"ID517","procid":8273,"severity":"notice","timestamp":"2023-07-20T21:39:08.580Z","version":1}
{"appname":"ahmadajmi","facility":"user","hostname":"names.com","message":"A bug was encountered but not in Vector, which doesn't have bugs","msgid":"ID236","procid":6192,"severity":"notice","timestamp":"2023-07-20T21:39:09.580Z","version":2}

{"appname":"devankoshal","facility":"local7","hostname":"make.us","message":"There's a breach in the warp core, captain","msgid":"ID172","procid":5465,"severity":"info","timestamp":"2023-07-20T21:39:10.580Z","version":1}

{"appname":"devankoshal","facility":"news","hostname":"make.de","message":"A bug was encountered but not in Vector, which doesn't have bugs","msgid":"ID916","procid":5888,"severity":"emerg","timestamp":"2023-07-20T21:39:11.580Z","version":1}

Every time I restart the service, I get the exact same logs. When I remove the API ECS service, my S3 ECS service works fine. Has anyone encountered this before?

I've tested the configuration file without sending the data to the s3 bucket and into my terminal instead, it worked as it's supposed to.

I've tried removing the s3 ecs service to see if that was the problem, but I still receive the same error.

I've tried removing the API s3 ecs service and everything works fine.


Solution

  • Turns out I was using the same cloudwatch logs for each service when I should have created another cloudwatch service