I’m running Databricks on Azure and trying to read a CSV file from Google Cloud Storage (GCS) bucket using Spark. However, despite configuring Spark with a Google service account key, I’m encountering the following error:
Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
I’ve configured Spark with these settings to ensure it uses the service account for authentication following this document: https://docs.databricks.com/en/connect/storage/gcs.html
I’ve configured Spark with these settings to ensure it uses the service account for authentication following this document: https://docs.databricks.com/en/connect/storage/gcs.html
spark.conf.set("spark.hadoop.google.cloud.auth.service.account.enable", "true")
spark.conf.set("spark.hadoop.fs.gs.auth.service.account.email", client_email)
spark.conf.set("spark.hadoop.fs.gs.project.id", project_id)
spark.conf.set("spark.hadoop.fs.gs.auth.service.account.private.key", private_key)
spark.conf.set("spark.hadoop.fs.gs.auth.service.account.private.key.id", private_key_id)
spark.conf.set("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
spark.conf.set("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
Attempting to read the test csv file from my GCS bucket:
gcs_path = "gs://ddfsdfts/events/31dfsdfs4_2025_02_01_000000000000.csv"
df = spark.read.format("csv") \
.option("header", "true") \
.option("inferSchema", "true") \
.load(gcs_path)
df.show()
The error happens when trying df.show()
I've seen a few other questions like this but no straight forward answers. Why is it trying to get to the metadata server token?
Instead of setting the configs using spark.conf.set which affects only the driver node, please try setting them at the cluster level like below as mentioned here. Setting these at the cluster level propogates these to the worker nodes as well.
spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.fs.gs.auth.service.account.email <client-email>
spark.hadoop.fs.gs.project.id <project-id>
spark.hadoop.fs.gs.auth.service.account.private.key {{secrets/scope/gsa_private_key}}
spark.hadoop.fs.gs.auth.service.account.private.key.id {{secrets/scope/gsa_private_key_id}}