Search code examples
google-cloud-platformdaskdask-distributedgoogle-iam

Where in GCP can I see rejected API requests due to lacking permissions?


My question is technically independent of the context but I introduce it for the sake of clarity.

# export GOOGLE_APPLICATION_CREDENTIALS="/home/raffael/repos/dask/playground-310111-1d035231463d.json"

from dask_cloudprovider.gcp import GCPCluster

cluster = GCPCluster(
    projectid="project_id", 
    n_workers=1, 
    source_image="projects/ubuntu-os-cloud/global/images/ubuntu-minimal-1804-bionic-v20210325",
    zone="europe-west1-b",
)

I want to create a Dask cluster using above service account and code.

If I attach Project Owner role to that SA - it works. If I attach "only" Compute Admin role to that SA - it fails.

For completeness' sake I list the error message but it doesn't say much more than that the supposedly created Scheduler instance doesn't exist.

Launching cluster with the following configuration: 
  Source Image: projects/ubuntu-os-cloud/global/images/ubuntu-minimal-1804-bionic-v20210325 
  Docker Image: daskdev/dask:latest 
  Machine Type: n1-standard-1 
  Filesytsem Size: 50 
  Disk Type: pd-standard 
  N-GPU Type:  
  Zone: europe-west1-b 
Creating scheduler instance
Failed to find running VMI...
{'id': 'projects/project_id/zones/europe-west1-b/instances', 'selfLink': 'https://www.googleapis.com/compute/v1/projects/project_id/zones/europe-west1-b/instances', 'kind': 'compute#instanceList'}
Traceback (most recent call last):
  File "gcp_cluster.py", line 9, in <module>
    cluster = GCPCluster(
  File "/home/raffael/miniconda3/envs/dask/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 603, in __init__
    super().__init__(debug=debug, **kwargs)
  [...]
  File "/home/raffael/miniconda3/envs/dask/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 209, in create_vm
    while await self.update_status() != "RUNNING":
  File "/home/raffael/miniconda3/envs/dask/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 255, in update_status
    raise Exception(f"Missing Instance {self.name}")
Exception: Missing Instance dask-9645b903-scheduler

So, the big question is - what permission is missing?

In Operations Logging I found the following error which correlates with the failure but I'm not sure if it is telling me something about what service was used and what permission is required.

enter image description here

@type and logName say something about "audit" so I Ctrl-F-ed for "audit" in all of Project Owner permissions and found as a "promising" candiate container.auditSinks.* but adding those doesn't solve the problem. I guess it's bs anyway - but I'm not exactly sure what @type and logName refer to so I just gave it a try.


Back to my question - does GCP tell me somewhere what API calls where rejected due to missing permissions?

(and of course - if you happen to just know what permissions/roles I need to add - that would be splendid)


Solution

  • Try granting this role to your SA at the project level roles/iam.serviceAccountUser which will allow this SA to impersonate other SA in the project, included any SA used for Compute Engine. As you can see, this is mentioned in the documentation that when you grant the Compute Admin, role that you still need to assign the SA User role at project or SA level as described here

    In the case that this don't work please share if you are using some tutorial or manual for this Dask cluster, or the steps to reproduce this and find the solution for this scenario.