Search code examples
google-cloud-platformpublish-subscribegoogle-cloud-pubsubgoogle-cloud-dataprocdataproc

Pub/Sub Publish message from Dataproc cluster using Python: ACCESS_TOKEN_SCOPE_INSUFFICIENT


I have a problem publishing a pub/sub message from dataproc cluster, from cloud function it works well with a service account, but with dataproc I got this error: 

raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.PermissionDenied: 403 Request had insufficient authentication scopes. [reason: "ACCESS_TOKEN_SCOPE_INSUFFICIENT"
domain: "googleapis.com"
metadata {
  key: "method"
  value: "google.pubsub.v1.Publisher.Publish"
}

metadata {
  key: "service"
  value: "pubsub.googleapis.com"
}
]

The service account assigned to this cluster suppose to have pub/sub publisher but the error above appears.

There is a workaround I have done to sort this issue, which is to use the service account key (.json) file to publish but I believe it is a bad practice as the secrets (private key) are exposed and can be read from the code, I tried to use the secret manager, but again there is no access from the cluster, same error when publishing to pub/sub (403) 

That's how I get the cluster to publish pub/sub topic 

service_account_credentials = {"""  hidden for security reasons lol """} 

credentials = service_account.Credentials.from_service_account_info(
service_account_credentials)

The code to publish 

 class EmailPublisher:
    def __init__(self, project_id: str, topic_id: str, credentials):
        self.publisher = pubsub_v1.PublisherClient(credentials=credentials)
        self.topic_path = self.publisher.topic_path(project_id, topic_id)

def publish_message(self, message: str):
    data = str(message).encode("utf-8")
    future = self.publisher.publish(
    self.topic_path, data, origin="dataproc-python-pipeline", username="gcp"
    )
    logging.info(future.result())
    logging.info("Published messages with custom attributes to %s", self.topic_path)

Is there any solution to make the Dataproc cluster read the service account and have permission to access GCP's services?

Thank you,


Solution

  • Dataproc runs on top of GCE, so the VMs needs permissions to access GCP's services, this is done using "scopes", example:

    SCOPE IN THE CASE ABOVE IS pusub

    gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --service-account=SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com \
    --scopes=SCOPE
    

    https://cloud.google.com/sdk/gcloud/reference/dataproc/clusters/create#--scopes