Search code examples
apache-sparkhiveapache-spark-sqlaws-glueaws-glue-data-catalog

Create database spark sql


I'm using spark 2.4.4 with AWS glue catalog.

In my spark job, I need to create a database in glue if it doesn't exist. I'm using the following statement in spark sql to do so.

spark.sql("CREATE DATABASE IF NOT EXISTS %s".format(hiveDatabase));

It works as expected in spark-shell, a database gets create in Glue. But when I run the same piece of code using spark-submit, then the database is not created. Is there a commit/flush that I need to do when using spark-submit?

EDIT I'm getting different results for show databases in spark-shell and spark-submit:

+---------------------+
|databaseName         |
+---------------------+
|all                  |
|default              |
|hive-db              |
|navi-database-account|
|navi-par             |
|testdb               |
+---------------------+


+------------+
|databaseName|
+------------+
|default     |
+------------+

Looks like spark-submit is creating the DB somewhere, but not in glue.


Solution

  • Needed to add following config:

    ("spark.sql.catalogImplementation", "hive")