Search code examples
pythongoogle-cloud-platformservice-accountsgoogle-cloud-vertex-aivertex-ai-pipeline

Custom Service Account with KFP pipelines in Vertex AI


Trying to use a specific (not-default) service account for running kfp pipelines in VertexAI. JSON keys are not an option.

Ideally gets both project ID and credentials using google.auth.default(), as suggested in google.auth user guide.

So far, I've tried:

  1. Using deprecated kfp.v2.google.client.AIPlatformClient, client instantiated with project ID specified and running the pipeline with create_run_from_spec with service_account keyword argument
  2. Using google.cloud.aiplatform.pipeline_jobs.PipelineJob, object instantiated with project ID, pipeline run with submit and service_account kwarg
  3. Creating new run from cloud UI (and the JSON file created by compiling the pipeline) with service account specified.

I've tried all three with both the actual pipeline (running on custom built containers) and a minimal working example (using lightweight python components). In all cases, when I run creds, project = google.auth.default() and then printing the project and creds.service_account_email, I get a project ID I don't recognize (always the same one in all cases) and default for the service account email.

I think I must be doing something wrong, but I can't figure out what. It seems like the configuration I'm passing to the pipeline run isn't being used at all.

For reference, the MWE:

 from kfp.v2 import dsl

@dsl.component(packages_to_install=['google-auth'])
def check_auth(name:str) -> str:
    import google.auth
    creds,project = google.auth.default()
    print(f'Project is: {project}')
    print(f'Got creds for: {creds.service_account_email}')
    return project

@dsl.pipeline(
    name='adc-mwe-pipeline'
)
def pipeline() -> str:
    auth_check = check_auth(name='name')
    return auth_check.output


from google.cloud.aiplatform import pipeline_jobs
from kfp.v2 import compiler

compiler.Compiler().compile(pipeline_func=pipeline, package_path='mwe.json')

start_pipeline = pipeline_jobs.PipelineJob(
    display_name='mwe',
    template_path='mwe.json',
    location='some-location',
    project='my-project',
    enable_caching=False
)

start_pipeline.submit(service_account="my-service-account")

Solution

  • Figured out the correct way to use application default credentials is to not invoke credentials explicitly at all.

    So, for example, with BigQuery:

    from google.cloud import bigquery
    client = bigquery.Client(project='my_project')
    query_job = client.query(some_sql_query)
    

    Running this in the Compute Engine instance or a component in a pipeline will use the credentials of the service account attached to the Compute Engine instance or the service account used to submit the pipeline (as in the question).

    Hope this helps someone else. Quite frustrating that it isn't documented clearly.