azure apache-spark jdbc apache-spark-sql greenplum

Error: [Pivotal][Greenplum JDBC Driver][Greenplum]prepared statement "PS1" already exists

I've a Greenplum cluster on Azure, that I can access to it's data from local machine when using simple JDBC connection. Now I tried to use the same JDBC driver in Spark SQL as follows:

val url = s"jdbc:pivotal:greenplum://$server:$port;DatabaseName=$database"
Spark.sqlContext.read.format("jdbc")
  .options(Map(
    "url"->url, "user"-> user, "password"-> password,
    "dbschema" -> "public", "dbtable" -> table
  ))
  .load()

When the load() is executed, the following error is thrown

Exception in thread "main" java.sql.SQLSyntaxErrorException: [Pivotal][Greenplum JDBC Driver][Greenplum]prepared statement "PS1" already exists. 
    at com.pivotal.jdbc.greenplumbase.ddcd.b(Unknown Source)
    at com.pivotal.jdbc.greenplumbase.ddcd.a(Unknown Source)
    at com.pivotal.jdbc.greenplumbase.ddcc.b(Unknown Source)
    at com.pivotal.jdbc.greenplumbase.ddcc.a(Unknown Source)
    at com.pivotal.jdbc.greenplum.wp.ddj.m(Unknown Source)
    at com.pivotal.jdbc.greenplum.ddg.c(Unknown Source)
    at com.pivotal.jdbc.greenplum.ddg.d(Unknown Source)
    at com.pivotal.jdbc.greenplum.ddg.a(Unknown Source)
    at com.pivotal.jdbc.greenplumbase.dddr.y(Unknown Source)
    at com.pivotal.jdbc.greenplumbase.dddr.x(Unknown Source)
    at com.pivotal.jdbc.greenplumbase.dddl.executeQuery(Unknown Source)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
    at dzlab.GreenplumSparkSample$.read1(GreenplumPropertyFactory.scala:21)
    at dzlab.GreenplumSparkSample$.main(GreenplumPropertyFactory.scala:35)
    at dzlab.GreenplumSparkSample.main(GreenplumPropertyFactory.scala)

If I use a simple JDBC connection and read/write data the java way, everything works fine. SO not sure what's the issue here?

Solution

Deployments of Greenplum in the cloud use pgBouncer with the pool_mode of transaction. pgBouncer does not support prepared statements in this mode.

There are a few workarounds. One is the recommended way, "The proper way to do it for JDBC is adding prepareThreshold=0 parameter to connect string."

The other workaround is to connect to Greenplum directly and bypass pgBouncer. The database listens on port 6432 while pgBouncer listens on 5432.