Search code examples
jdbcaws-gluenetezza

Connection timeout when reading Netezza from AWS Glue


I am trying to use AWS Glue for pulling data from my on-premise Netezza database into S3. The code I have written so far (not complete)

df = glueContext.read.format("jdbc")\
    .option("driver", "org.netezza.Driver")\
    .option("url", "jdbc:netezza://NetezzaHost01:5480/Netezza_DB")\
    .option("dbtable", "ADMIN.table1")\
    .option("user", "myUser")\
    .option("password", "myPassword")\
    .load()

print(df.count())

I am using a custom JDBC driver jar since AWS Glue does not support Netezza natively (the driver is provided by IBM) and specifying it while triggering the job as a Dependency.

This code keeps failing with a timeout error:

py4j.protocol.Py4JJavaError: An error occurred while calling o68.load.
: org.netezza.error.NzSQLException: Connection timed out (Connection timed out)

A few things I have tried which did not work: - Use spark instead of glue to read - Use a very small table (<100 rows) as source

I should add that the Netezza database is behind a corporate firewall, but I do not see any options to specify security groups (as you can do with Glue native connections) when using custom drivers.

Any thoughts?


Solution

  • 1) If you are trying to access the netezza host that is on prem, you first need to validate that you are able to reach netezza from the VPC that you have chosen for your glue job.

    2) This poses a problem since the VPC is chosen on the basis of the connection you add to glue, whcih apparantly does not mention netezza as being supported. However you can still enter the netezza url and set it up.The test might not work, however at least you would be able to choose a subnet and sec-group of your choosing. Your sec group should open up the netezza port

    3) Im guessing your vpc has direct connect/vpn setup to your office network. As long as your firewall accepts connections from the CIDR range of your subnet that you have added to your glue job, it should work. You might need to ask the team that manages the firewall for netezza, to open up connections from your VPC/subnet ip-range