Search code examples
scalaapache-sparkapache-spark-sqlspark-jdbc

Schema capitalization(uppercase) problem when reading with Spark


Using Scala here:

Val df = spark.read.format("jdbc").
  option("url", "<host url>").
  option("dbtable", "UPPERCASE_SCHEMA.table_name").
  option("user", "postgres").
  option("password", "<password>").
  option("numPartitions", 50).
  option("fetchsize", 20).
  load()

The database I'm using the above code to call from has many schemas and they are all in uppercase letters (UPPERCASE_SCHEMA).

No matter how I try to denote that the schema is in all caps, Spark converts it to lowercase which fails to initialize with the actual DB.

I've tried making it a variable and explicitly denoting it is all uppercase, etc. in multiple languages, but no luck.

Would anyone know a workaround?

When I went into the actual DB (Postgres) and temporarily changed the schema to all lowercase, it worked absolutely fine.


Solution

  • Try to set spark.sql.caseSensitive to true (false by default)

    spark.conf.set('spark.sql.caseSensitive', true)
    

    You can see in the source code its definition: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L833

    In addition, you can see in the JDBCWriteSuite how it affects the JDBC connector: https://github.com/apache/spark/blob/ee95ec35b4f711fada4b62bc27281252850bb475/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala