Search code examples
amazon-web-servicesamazon-redshiftaws-glueaws-databrew

Unable to reach AWS Glue to get connection in DataBrew


I'm trying to get started with AWS Databrew using connection to Redshift. I did add connection to AWS Glue and it is working while testing. When databrew tries to use this connection it gives following error. Both databrew and glue are on same region.

{"error":"Failure reading from input connection AwsGlueDataBrew-databrew-to-redshift with \"public.table\": Unable to reach AWS Glue to get connection AwsGlueDataBrew-databrew-to-redshift. Exception: Connect timeout on endpoint URL: \"https://glue.us-west-2.amazonaws.com/\""}

Policy attached with projects is like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabases",
                "glue:GetPartitions",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetConnection"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::databrew-public-datasets-*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVpcEndpoints",
                "ec2:DescribeRouteTables",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcAttribute",
                "ec2:CreateNetworkInterface"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "ec2:DeleteNetworkInterface",
            "Condition": {
                "StringLike": {
                    "aws:ResourceTag/aws-glue-service-resource": "*"
                }
            },
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags",
                "ec2:DeleteTags"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "aws:TagKeys": [
                        "aws-glue-service-resource"
                    ]
                }
            },
            "Resource": [
                "arn:aws:ec2:*:*:network-interface/*",
                "arn:aws:ec2:*:*:security-group/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:log-group:/aws-glue-databrew/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": "arn:aws:secretsmanager:*:*:secret:databrew!default-*"
        }
    ]
}

Can someone please help me resolve this issue?

Thank you.


Solution

  • This happens because the DataBrew service is trying to reach the AWS Glue service endpoint when you are trying to use a project/job. (AWS Glue test connection functionality works differently)

    You have two ways to resolve this issue

    1. Attach a VPC Endpoint for AWS Glue service in your VPC. This will ensure a way to reach the Glue service, but securely.
    2. Open your VPC to public internet, so any traffic from your VPC can travel via internet and the API calls to the Glue service succeeds.

    I recommend option #1 since it is more secure (and simpler), but it comes with an overhead of some cost.