I've created an EMR cluster, and have specified the following in my spark config:
hive.metastore.glue.role.arn: arn:aws:iam::omitted:role/EMR_DefaultRole
I can confirm that this value has been properly set from the EMR console in AWS:
Within my job run logic, I execute
spark.sql("show databases").show()
This results in the following logs:
22/10/22 01:18:18 WARN HiveConf: HiveConf of name hive.metastore.glue.role.arn does not exist
22/10/22 01:18:18 ERROR AWSGlueClientFactory: Unable to build AWSGlueClient: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
22/10/22 01:18:18 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Unable to build AWSGlueClient: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1237)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:175)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:167)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:117)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...
It seems like my Glue client is not able to be instantiated due to that glue role ARN not being found in my conf.
I would really really appreciate some ideas on this, or any debugging suggestions. Anything helps--thanks in advance :)
Try to set the com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
class as metastore client factory using the hive.metastore.client.factory.class
property.
For a full example of a configuration using code please see Metastore configuration documentation page and using the spark-hive-site
classification see the Use the AWS Glue Data Catalog as the metastore for Spark SQL page. Also, please note that in the first case you need to prefix the property name with spark.hadoop.
.