Search code examples
pythongoogle-cloud-platformgcloudgoogle-cloud-dataprocgoogle-cloud-sdk

Get dataproc operation in Python


Having an operation_id from a long running operation (starting Dataproc cluster), I'm trying to get the operation instance to call operation.result() on it in Python.

First looking at the Rest reference here, the generated GET request works as expected:

curl \
  'https://dataproc.googleapis.com/v1/projects/myproject/regions/europe-west6/operations/some-operation-id?key=[YOUR_API_KEY]' \
  --header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
  --header 'Accept: application/json'

Also calling gcloud on the command line returns the operation correctly:

gcloud dataproc operations describe some-operation-id

Now I'm failing to replicate the same in Python. Here's what I've tried:

client_options = ClientOptions(
    api_endpoint=f"dataproc.googleapis.com",
)

client = AbstractOperationsClient(client_options=client_options)

operation = client.get_operation(name="projects/myproject/regions/europe-west6/operations/some-operation-id")

This raises an error:

ValueError: Request {'name': 'projects/myproject/regions/europe-west6/operations/some-operation-id'} does not match any URL path template in available HttpRule's ['/v1/{name=operations/**}']

It looks like the path template is wrong, it only accepts something with operations/... so I've tried to omit the project and the region:

operation = client.get_operation(name="operations/some-operation-id")

Which gets me past that error but then can not find the URL:

google.api_core.exceptions.NotFound: 404 GET https://dataproc.googleapis.com:443/v1/operations/some-operation-id

So my question is, how do I need to call client.get_operation to get the operation if I only have the project, region and operation_id ?


Solution

  • Here's one way to achieve this albeit using OperationsClient (not AbstractOperationsClient):

    import os
    
    from google.api_core import client_options,operations_v1
    from google.cloud import dataproc_v1
    
    project = os.getenv("PROJECT")
    region = os.getenv("REGION")
    cluster = os.getenv("CLUSTER")
    
    
    api_endpoint = f"{region}-dataproc.googleapis.com:443"
    
    options = client_options.ClientOptions(
        api_endpoint=api_endpoint,
    )
    
    client = dataproc_v1.ClusterControllerClient(
            client_options=options,
    )
    
    # See: https://stackoverflow.com/questions/53317626
    client = operations_v1.OperationsClient(client._transport._grpc_channel)
    
    name=f"projects/{project}/regions/{region}/operations"
    
    resp = client.list_operations(
        name=name,
        filter_="{}",
    )
    
    # Either
    # print(list(resp))
    # Or
    for foo in resp:
        print(foo)
    
    # With a specific operation ID
    operation="..."
    name=f"projects/{project}/regions/{region}/operations/{operation}"
    
    resp = client.get_operation(name=name)
    print(resp)
    

    One tool that I used to ensure (we) were on the correct path was to append --log-http to the gcloud command:

    gcloud dataproc operations list \
    --project=${PROJECT} \
    --region=${REGION} \
    --log-http
    

    Yielding:

    =======================
    ==== request start ====
    uri: https://dataproc.googleapis.com/v1/projects/{project}/regions/{region}/operations?alt=json&filter=%7B%7D&pageSize=100