amazon-web-services apache-spark hive aws-glue amazon-emr

HiveConf of name does not exist

I've created an EMR cluster, and have specified the following in my spark config:

hive.metastore.glue.role.arn: arn:aws:iam::omitted:role/EMR_DefaultRole

I can confirm that this value has been properly set from the EMR console in AWS:

Within my job run logic, I execute

spark.sql("show databases").show()

This results in the following logs:

22/10/22 01:18:18 WARN HiveConf: HiveConf of name hive.metastore.glue.role.arn does not exist
22/10/22 01:18:18 ERROR AWSGlueClientFactory: Unable to build AWSGlueClient: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
22/10/22 01:18:18 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Unable to build AWSGlueClient: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException)
    at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1237)
    at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:175)
    at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:167)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
    at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
    at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:117)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...

It seems like my Glue client is not able to be instantiated due to that glue role ARN not being found in my conf.

I would really really appreciate some ideas on this, or any debugging suggestions. Anything helps--thanks in advance :)

Solution

Try to set the com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory class as metastore client factory using the hive.metastore.client.factory.class property.

For a full example of a configuration using code please see Metastore configuration documentation page and using the spark-hive-site classification see the Use the AWS Glue Data Catalog as the metastore for Spark SQL page. Also, please note that in the first case you need to prefix the property name with spark.hadoop..