Search code examples
postgresqlgoogle-cloud-platformpysparkssl-certificategoogle-cloud-sql

How to provide SSL Server certificate/ Client Certificate/ SSL Keys in Pyspark JDBC connection.. Trying to connect PostgreSQL in GCP from on Prem


I am trying to connect from on prem pyspark to GCP PostgreSQL, how to provide the required certificates and syntax in Pyspark to connect GCP PostgreSQL

df2=spark.read.format('jdbc')\
              .option('driver','org.postgresql.Driver')\
              .option('url','XXXXXX')\
              .option("dbtable",'XXXXXX')\
              .option('user','XXXXX')\
              .option('password','XXXXXX')\
              .option('ssl',True)\
              .option('sslmode','require')\
              .load()

I am getting this error -

org.postgresql.util.PSQLException: FATAL: connection requires a valid client certificate


Solution

  • You need to set the sslrootcert and sslcert options in your existing to resolve the issue. See the below implementation for details -

    df2=spark.read.format('jdbc')\
                  .option('driver','org.postgresql.Driver')\
                  .option('url','jdbc:postgresql://<host>:<port>/<database>')\
                  .option('url','XXXXXX')\
                  .option("dbtable",'XXXXXX')\
                  .option('user','XXXXX')\
                  .option('password','XXXXXX')\
                  .option('ssl',True)\
                  .option('sslmode','require')\
                  .option('sslrootcert','<path_to_server_ca_certificate>')\
                  .option('sslcert','<path_to_client_certificate>')\
                  .option('sslkey', '<path_to_client_key>')
                  .load()