Search code examples
amazon-rdsamazon-vpcaws-glue-connectionaws-glue-crawler

AWS Glue Connection to AWS RDS SQL Server in VPC for AWS Glue Crawler - Connection fails


Context: I have a SQL Server instance with AWS RDS in an AWS VPC. The ACL allows all inbound and outbound traffic. The Security Group of the SQL Server allows inbound all TCP traffic on port 1433 (SQL Default port) and outbound on port 80 (HTTP) and 443 (HTTPs). The username and password to access the SQL Server are stored in AWS Secrets Manager. I can successfully connect to the SQL Server and related databases.

Goal: I want to use AWS Glue to crawl a subset of tables in one of the databases to have the metadata stored in the AWS Glue Data Catalogue.

Problem: After creation of the AWS Glue connection the test of it fails.

How do I need to configure the connection and VPC so that I can successfully establish a connection?


Solution

  • The following steps helped me to successfully establish a connection:

    • It is necessary to create an IAM role that can be assumed by AWS Glue and provides the required permissions. I.e. a Trust Policy with "glue.amazonaws.com" and permissions to work with AWS Glue (e.g. AWSGlueServiceRole), read data from the SQL Database (e.g. AmazonRDSReadOnlyAccess) and access the secrets from AWS Secrets Manager (e.g. SecretsManagerReadWrite -> should be more restricted with a custom policy).

    • When initially configuring the connection via the AWS Portal the VPC configuration block is not available. But it is essential to configure it with the VPC, Subnet and Security Group. I.e. after initial creation it is necessary to edit it again and add the information.

    enter image description here

    • The Security Group of the RDS instance requires a self-referencing inbound and outbound rule for all traffic

    enter image description here

    enter image description here

    • It is necessary to add two endpoints to the VPC of the RDS instance
      • S3: com.amazonaws.<region>.s3, Gateway, RDS instance VPC, related route table
      • Secrets Manager:com.amazonaws.<region>.secretsmanager, RDS instance VPC, Subnet of the RDS instance, Security Group of the RDS instance

    After considering all these steps it worked for me.