Search code examples
amazon-redshiftaws-glue

Glue Python Shell - Private Subnet Access


I have a Redshift Cluster in my Private Subnet. I am trying to write a UNLOAD job using Glue Python Shell. But I am not able to connect with my Cluster since it resides in Private subnet. I tried to Add JDBC and Redshift Connection, still I am unsuccessful.

I went through this article and still unfortunately I am not able to understand the workflow.

How to connect Glue Python Shell to Redshift Cluster available in Private Subnet ? It will be great if some one could help me to understand this workflow.


Solution

  • I did the following steps in order to connect my Glue Python Shell Job with the Redshift Cluster under the Private Subnet.

    1. Define the JDBC Connection
      ● Go to Glue Console
      ● Under Connections Add a new JDBC Connection
      ● Provide the necessary details for your Redshift endpoint like
      -> JDBC URL : jdbc:redshift://host:port/database
      -> Username and Password
      ● In VPC ID choose the VPC ID of the Redshift Cluster itself
      ● Subnet ID also choose the same as Redshift Cluster
      ● Security Group : Choose the same Security Group used for the Redshift Cluster
      ● Once done save this connection
    2. Change the Security Group : Navigate to the Redshift Security Group that we selected in the first step and make the following changes.
      ● Copy the Security Group ID
      ● Edit the Security Group
      ● Under Inbound Rules: Choose ALL TCP and in source paster the Security Group ID ( Basically here we are self referencing Security Group for ALL TCP )
      ● Save the Security Group
    3. Navigate to the Glue Console again and under the connection , choose the connection that is defined in Step 1 and test it , this option is available in the console itsef

    If the configurations are fine you will see the success message. Now just go to your job and under Connections choose the connection defined above and you can access it.

    References :

    How can I access aws resources in VPC from AWS glue?
    https://docs.aws.amazon.com/glue/latest/dg/setup-vpc-for-glue-access.html
    https://docs.aws.amazon.com/glue/latest/dg/connection-JDBC-VPC.html
    https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/
    https://docs.aws.amazon.com/glue/latest/dg/how-it-works.html

    Hope it helps..!!!