Search code examples
pythongoogle-compute-enginegoogle-cloud-dataproc

How do I get projectId from GoogleCredentials?


Using Python I would like to get a list of all my Dataproc clusters on Google Cloud.

I have service account credentials stored in a JSON keyfile whose location is referred to by env var GOOGLE_APPLICATION_CREDENTIALS. Here is the code I have so far:

import os
import googleapiclient.discovery
from oauth2client.client import GoogleCredentials


def build_dataproc_service(credentials):
    return googleapiclient.discovery.build("dataproc", "v1", credentials=credentials)


def list_clusters():
    credentials = GoogleCredentials.get_application_default()
    dataproc = build_dataproc_service(credentials)
    clusters = dataproc.projects().regions().clusters().list(projectId="my-project", region="REGION").execute()
    return clusters


if __name__ == "__main__":
    list_clusters()

As you can see I have hardcoded the projectId ("my-project"). Given that the projectId exists in the JSON keyfile I had hoped I could obtain it simply be interrogating a property of the credentials object, but no such property exists. The projectId does exist embedded within the credentials._service_account_email string property but extracting it from there is clunky and feels wrong.

I assume there must be a better way. How can I obtain the projectId of the project in which the service account resides?

Note that initially I intend for this code to be run in a docker container on a Google Compute Engine instance however one day in the future I may want to run on GKE. Not sure if that affects the answer or not.


Solution

  • A formal way to think about this is that while projectId is sometimes a property of a service account, a projectId is not generally a property of a long-lived credential. For example, think of your offline-installed personal credential you use with the gcloud CLI, if any, associated with your Google account/email address. That email identity doesn't reside in any cloud project, and yet can be used to derive a GoogleCredential object.

    Technically, if you want to do it "properly" you'd need a master service account that has permissions to GET service account descriptions in all the projects that hold the actual service accounts you plan to use, and then call IAM API's projects.serviceAccounts.get on the service-account email address, not on the "credential" objet. The responses there can identify the project id in which the service account resides. This is equivalent to the gcloud command:

    gcloud iam service-accounts describe [email protected]
    

    However, as Dagang says, it's often going to backfire in the long run to start baking in assumptions that the service account will only be used for operations on the projects in which it resides. In particular, while service account resources themselves live inside projects, they are often used in a cross-project manner. One common operational pattern is to use a single GCP project to manage a large number of service accounts that are then granted various fine-grained access to resources in other GCP projects.