Using Python I would like to get a list of all my Dataproc clusters on Google Cloud.
I have service account credentials stored in a JSON keyfile whose location is referred to by env var GOOGLE_APPLICATION_CREDENTIALS. Here is the code I have so far:
import os
import googleapiclient.discovery
from oauth2client.client import GoogleCredentials
def build_dataproc_service(credentials):
return googleapiclient.discovery.build("dataproc", "v1", credentials=credentials)
def list_clusters():
credentials = GoogleCredentials.get_application_default()
dataproc = build_dataproc_service(credentials)
clusters = dataproc.projects().regions().clusters().list(projectId="my-project", region="REGION").execute()
return clusters
if __name__ == "__main__":
list_clusters()
As you can see I have hardcoded the projectId ("my-project"
). Given that the projectId exists in the JSON keyfile I had hoped I could obtain it simply be interrogating a property of the credentials
object, but no such property exists. The projectId does exist embedded within the credentials._service_account_email
string property but extracting it from there is clunky and feels wrong.
I assume there must be a better way. How can I obtain the projectId of the project in which the service account resides?
Note that initially I intend for this code to be run in a docker container on a Google Compute Engine instance however one day in the future I may want to run on GKE. Not sure if that affects the answer or not.
A formal way to think about this is that while projectId is sometimes a property of a service account, a projectId is not generally a property of a long-lived credential. For example, think of your offline-installed personal credential you use with the gcloud
CLI, if any, associated with your Google account/email address. That email identity doesn't reside in any cloud project, and yet can be used to derive a GoogleCredential object.
Technically, if you want to do it "properly" you'd need a master service account that has permissions to GET
service account descriptions in all the projects that hold the actual service accounts you plan to use, and then call IAM API's projects.serviceAccounts.get on the service-account email address, not on the "credential" objet. The responses there can identify the project id in which the service account resides. This is equivalent to the gcloud
command:
gcloud iam service-accounts describe [email protected]
However, as Dagang says, it's often going to backfire in the long run to start baking in assumptions that the service account will only be used for operations on the projects in which it resides. In particular, while service account
resources themselves live inside projects, they are often used in a cross-project manner. One common operational pattern is to use a single GCP project to manage a large number of service accounts that are then granted various fine-grained access to resources in other GCP projects.