After installing Kubeflow 0.7 on a new GKE Cluster (via https://deploy.kubeflow.cloud) I have configured OAuth and Workload Identity based on their respective tutorials.
On my pipeline I have to access a GCS bucket from the same project and it seems that I do not have access for this operation.
However the cluster does have access to GCR and Storage rights do download and mount Docker Images and run the code. It fails to download from other buckets when requested from code, even when the buckets are in the same project.
The code uses default method of authentication:
storage_client = storage.Client(project_id)
bucket = storage_client.get_bucket(bucket)
Does anyone have ideas on how to solve and prevent this from happening with access do BigQuery (which will be accessed after the download of these files)
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/$BUCKET?projection=noAcl Primary: /namespaces/$PROJECT_ID.svc.id.goog with additional claims does not have storage.buckets.get access to $BUCKET.
Ok, so for anyone with similar problems. I forgot to add the following snippet in the code step.apply(gcp.use_gcp_secret('user-gcp-sa'))
This works even with kfp.components.load_component_from_file()
Figured this out thanks to the amazing folks at the kubeflow slack channel!