I have a problem publishing a pub/sub message from dataproc cluster, from cloud function it works well with a service account, but with dataproc I got this error:
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.PermissionDenied: 403 Request had insufficient authentication scopes. [reason: "ACCESS_TOKEN_SCOPE_INSUFFICIENT"
domain: "googleapis.com"
metadata {
key: "method"
value: "google.pubsub.v1.Publisher.Publish"
}
metadata {
key: "service"
value: "pubsub.googleapis.com"
}
]
The service account assigned to this cluster suppose to have pub/sub publisher but the error above appears.
There is a workaround I have done to sort this issue, which is to use the service account key (.json) file to publish but I believe it is a bad practice as the secrets (private key) are exposed and can be read from the code, I tried to use the secret manager, but again there is no access from the cluster, same error when publishing to pub/sub (403)
That's how I get the cluster to publish pub/sub topic
service_account_credentials = {""" hidden for security reasons lol """}
credentials = service_account.Credentials.from_service_account_info(
service_account_credentials)
The code to publish
class EmailPublisher:
def __init__(self, project_id: str, topic_id: str, credentials):
self.publisher = pubsub_v1.PublisherClient(credentials=credentials)
self.topic_path = self.publisher.topic_path(project_id, topic_id)
def publish_message(self, message: str):
data = str(message).encode("utf-8")
future = self.publisher.publish(
self.topic_path, data, origin="dataproc-python-pipeline", username="gcp"
)
logging.info(future.result())
logging.info("Published messages with custom attributes to %s", self.topic_path)
Is there any solution to make the Dataproc cluster read the service account and have permission to access GCP's services?
Thank you,
Dataproc runs on top of GCE
, so the VMs needs permissions to access GCP's services, this is done using "scopes
", example:
SCOPE IN THE CASE ABOVE IS pusub
gcloud dataproc clusters create CLUSTER_NAME \
--region=REGION \
--service-account=SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com \
--scopes=SCOPE
https://cloud.google.com/sdk/gcloud/reference/dataproc/clusters/create#--scopes