Search code examples
apache-sparkapache-spark-sqlhive-metastore

Spark-Sql Custom Metastore


In HIVE, we can setup different RDBMs as meta store and let HIVE stores all the metadata in it. Besides that, by means of hiveserver2, we can make HIVE listens for requests and serves them.

Similarly, there are many documents which say that Spark-SQL can also be used in the similar fashion. Can we setup Oracle (an example) as meta store for Spark-Sql? If yes, can someone please help me how to set it up.

Thanks!


Solution

  • Spark uses Hive Metastore as external metastore and you choose your own DB, so Oracle database is fine. Otherwise Spark uses Derby DB which is ok for your own research single user pseudo or small non-production cluster. You need to configure appropriately for external metastore.

    In AWS on EMR you can use AWS Glue as external Spark Metastore.

    Some of the Distros from vendors imposed some specifics here as well.