Having an operation_id from a long running operation (starting Dataproc cluster), I'm trying to get the operation instance to call operation.result()
on it in Python.
First looking at the Rest reference here, the generated GET request works as expected:
curl \
'https://dataproc.googleapis.com/v1/projects/myproject/regions/europe-west6/operations/some-operation-id?key=[YOUR_API_KEY]' \
--header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
--header 'Accept: application/json'
Also calling gcloud on the command line returns the operation correctly:
gcloud dataproc operations describe some-operation-id
Now I'm failing to replicate the same in Python. Here's what I've tried:
client_options = ClientOptions(
api_endpoint=f"dataproc.googleapis.com",
)
client = AbstractOperationsClient(client_options=client_options)
operation = client.get_operation(name="projects/myproject/regions/europe-west6/operations/some-operation-id")
This raises an error:
ValueError: Request {'name': 'projects/myproject/regions/europe-west6/operations/some-operation-id'} does not match any URL path template in available HttpRule's ['/v1/{name=operations/**}']
It looks like the path template is wrong, it only accepts something with operations/... so I've tried to omit the project and the region:
operation = client.get_operation(name="operations/some-operation-id")
Which gets me past that error but then can not find the URL:
google.api_core.exceptions.NotFound: 404 GET https://dataproc.googleapis.com:443/v1/operations/some-operation-id
So my question is, how do I need to call client.get_operation to get the operation if I only have the project, region and operation_id ?
Here's one way to achieve this albeit using OperationsClient
(not AbstractOperationsClient
):
import os
from google.api_core import client_options,operations_v1
from google.cloud import dataproc_v1
project = os.getenv("PROJECT")
region = os.getenv("REGION")
cluster = os.getenv("CLUSTER")
api_endpoint = f"{region}-dataproc.googleapis.com:443"
options = client_options.ClientOptions(
api_endpoint=api_endpoint,
)
client = dataproc_v1.ClusterControllerClient(
client_options=options,
)
# See: https://stackoverflow.com/questions/53317626
client = operations_v1.OperationsClient(client._transport._grpc_channel)
name=f"projects/{project}/regions/{region}/operations"
resp = client.list_operations(
name=name,
filter_="{}",
)
# Either
# print(list(resp))
# Or
for foo in resp:
print(foo)
# With a specific operation ID
operation="..."
name=f"projects/{project}/regions/{region}/operations/{operation}"
resp = client.get_operation(name=name)
print(resp)
One tool that I used to ensure (we) were on the correct path was to append --log-http
to the gcloud
command:
gcloud dataproc operations list \
--project=${PROJECT} \
--region=${REGION} \
--log-http
Yielding:
=======================
==== request start ====
uri: https://dataproc.googleapis.com/v1/projects/{project}/regions/{region}/operations?alt=json&filter=%7B%7D&pageSize=100