amazon-web-services aws-glue aws-glue-data-catalog

Glue job is failing with connection time out error

I have a Glue ETL job which reads the data from the catalog and writes to s3. Once this is done a crawler needs to be triggered to update the data in Athena.

So, I'm using glue_client.start_crawler(Name='crawler_name') method to start a crawler. But whenever I tried to start a crawler from the ETL Glue job, it is failing with following error

ConnectTimeoutError: Connect timeout on endpoint URL: "https://glue.eu-central-1.amazonaws.com/"

Solution

When you launch a Glue job inside a VPC by attaching a connection the traffic will be residing in only AWS network and without going through the public internet.

This is the reason why you are not able to connect to Glue boto3 start crawler API call. To do so you need to create/add the Glue VPC endpoint to the VPC and the request to start crawler has to be as shown as below which includes endpoint_url.

import boto3
glue = boto3.client(service_name='glue', region_name='eu-central-1',
              endpoint_url='https://glue.eu-central-1.amazonaws.com')
glue.start_crawler(Name='crawler_name')