Search code examples
azureapache-sparkpysparkdatabricksazure-databricks

How to change the Spark user running jobs in Azure Databricks?


I am using Spark on Azure Databricks 5.5. I submit Spark jobs through the Databricks workspace UI via Jobs, Notebooks, and Spark-submit. The jobs are being successfully submitted, and Databricks new clusters are being spawned or existing ones are being utilized. But, the user that is running the job on the executor nodes is root by default. Is it possible to change the user that runs the jobs on Azure Databricks(which inherently doesn't allow SSH access)?

Usually, when I use spark-submit CLI on a cluster with Shell access; I change the user by using sudo: sudo -u exampleuser spark-submit.... In this example, the user 'exampleuser' is present on all nodes of the cluster. So, I wanted to know whether it is possible to change the user running the Spark jobs in Azure Databricks clusters.


Solution

  • After discussion with Azure Databricks team, The correct way to change the user running the spark jobs is by setting an environment variable HADOOP_USER_NAME during cluster creation. This changes the effective user running the Spark Job from root to the user $HADOOP_USER_NAME