Search code examples
apache-sparkkuberneteshiveapache-spark-sqlhive-metastore

Spark application syncing with Hive metastore - "There is no primary group for UGI spark" error


I'm running a simple Spark job on Kubernetes cluster that writes data to HDFS with Hive catologization. For whatever reason my app fails to run Spark SQL commands with the following exception:

21/09/22 09:23:54 ERROR SplunkStreamListener: |exception=org.apache.spark.sql.AnalysisException
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.io.IOException There is no primary group for UGI spark (auth:SIMPLE));
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
    at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:183)
    at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createDatabase(ExternalCatalogWithListener.scala:47)
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:211)
    at org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
    at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
    at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
    at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
    at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3369)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)

I'm connecting to Hive metastore via Thrift URL. The docker container runs the application as non-root user. Are there some kind of groups I need the user to be added to sync with the metastore?


Solution

  • Try add this before setting up the spark context

    System.setProperty("HADOOP_USER_NAME", "root")