My question is technically independent of the context but I introduce it for the sake of clarity.
# export GOOGLE_APPLICATION_CREDENTIALS="/home/raffael/repos/dask/playground-310111-1d035231463d.json"
from dask_cloudprovider.gcp import GCPCluster
cluster = GCPCluster(
projectid="project_id",
n_workers=1,
source_image="projects/ubuntu-os-cloud/global/images/ubuntu-minimal-1804-bionic-v20210325",
zone="europe-west1-b",
)
I want to create a Dask cluster using above service account and code.
If I attach Project Owner role to that SA - it works. If I attach "only" Compute Admin role to that SA - it fails.
For completeness' sake I list the error message but it doesn't say much more than that the supposedly created Scheduler instance doesn't exist.
Launching cluster with the following configuration:
Source Image: projects/ubuntu-os-cloud/global/images/ubuntu-minimal-1804-bionic-v20210325
Docker Image: daskdev/dask:latest
Machine Type: n1-standard-1
Filesytsem Size: 50
Disk Type: pd-standard
N-GPU Type:
Zone: europe-west1-b
Creating scheduler instance
Failed to find running VMI...
{'id': 'projects/project_id/zones/europe-west1-b/instances', 'selfLink': 'https://www.googleapis.com/compute/v1/projects/project_id/zones/europe-west1-b/instances', 'kind': 'compute#instanceList'}
Traceback (most recent call last):
File "gcp_cluster.py", line 9, in <module>
cluster = GCPCluster(
File "/home/raffael/miniconda3/envs/dask/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 603, in __init__
super().__init__(debug=debug, **kwargs)
[...]
File "/home/raffael/miniconda3/envs/dask/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 209, in create_vm
while await self.update_status() != "RUNNING":
File "/home/raffael/miniconda3/envs/dask/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 255, in update_status
raise Exception(f"Missing Instance {self.name}")
Exception: Missing Instance dask-9645b903-scheduler
So, the big question is - what permission is missing?
In Operations Logging I found the following error which correlates with the failure but I'm not sure if it is telling me something about what service was used and what permission is required.
@type
and logName
say something about "audit" so I Ctrl-F-ed for "audit" in all of Project Owner permissions and found as a "promising" candiate container.auditSinks.* but adding those doesn't solve the problem.
I guess it's bs anyway - but I'm not exactly sure what @type
and logName
refer to so I just gave it a try.
Back to my question - does GCP tell me somewhere what API calls where rejected due to missing permissions?
(and of course - if you happen to just know what permissions/roles I need to add - that would be splendid)
Try granting this role to your SA at the project level roles/iam.serviceAccountUser
which will allow this SA to impersonate other SA in the project, included any SA used for Compute Engine. As you can see, this is mentioned in the documentation that when you grant the Compute Admin, role that you still need to assign the SA User role at project or SA level as described here
In the case that this don't work please share if you are using some tutorial or manual for this Dask cluster, or the steps to reproduce this and find the solution for this scenario.