Search code examples
c#.netgoogle-cloud-platformgoogle-cloud-vertex-ai

gRPC Unimplemented thrown when creating a custom job in Vertex AI using JobServiceClient


I'm trying to implement code to start a custom job in Vertex.

I have no problem starting a custom job using gcloud:

gcloud ai custom-jobs --project my_project_id create --region=europe-west1 --display-name="train model based on custom container" --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest

I've not been able to find official code sample for .NET but tried to mimick someone else doing it in Python plus ChatGPT produced a similar code sample:

var projectId = "my_project_id";
var locationId = "europe-west1";
var client = await JobServiceClient.CreateAsync();

var createCustomJobRequest = new CreateCustomJobRequest
{
  ParentAsLocationName = new LocationName(projectId, locationId),
  CustomJob = new CustomJob
  {
    DisplayName = "train model based on custom container",
    JobSpec = new CustomJobSpec()
    {
      WorkerPoolSpecs =
      {
        new WorkerPoolSpec
        {
          MachineSpec = new MachineSpec
          {
            MachineType = "n1-standard-4"
          },
          ReplicaCount = 1,
          ContainerSpec = new ContainerSpec()
          {
            ImageUri = "europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest"
          }
        }
      }
    }
  }
};

var result3 = await client.CreateCustomJobAsync(createCustomJobRequest); // exception thrown here

Unfortunately, I get an exception back:

Grpc.Core.RpcException: 'Status(StatusCode="Unimplemented", Detail="Bad gRPC response. HTTP status code: 404")'

Things I've tried and failed

  1. Used the overload of CreateCustomJobAsync() that takes a CustomJob and a Parent instead of a CreateCustomJobRequest object.
  2. Used JobServiceClientBuilder instead of JobServiceClient.CreateAsync() and set the Endpoint argument as europe-west1-aiplatform.googleapis.com.

What am I missing to get a custom job started in Vertex AI?


Solution

  • I should have digged a bit more around JobServiceClientBuilder. Specifically, when using the builder's client object to start a job I actually got a different message back:

    Grpc.Core.RpcException
      HResult=0x80131500
      Message=Status(StatusCode="PermissionDenied", Detail="Permission 'aiplatform.customJobs.create' denied on resource '//aiplatform.googleapis.com/projects/my_project_id/locations/europe-west1' (or it may not exist).")
      Source=Google.Api.Gax.Grpc
    

    While the message was somewhat clear I wasn't sure if it was the right error message, like how Unimplemented didn't make sense so I dismissed this one too.

    Anyway, since writing the question I thought that gcloud and SDK authentication may be different. It turns out that the active user in the command line (the * next to the user in gcloud auth list) is my own credential while the environment variable GOOGLE_APPLICATION_CREDENTIAL is referencing a service account. Once I added the role Vertex AI Administrator to the service account I was finally able to start a job.

    So, use JobServiceClient.CreateAsync() if the sa behind GOOGLE_APPLICATION_CREDENTIAL has the right permission. If you need to use another sa then instantiate a JobServiceClient like so:

    var client = await new JobServiceClientBuilder
    {
        Endpoint = "europe-west1-aiplatform.googleapis.com",
        GoogleCredential = GoogleCredential.FromFile(@"your-service-account.json")
    }.BuildAsync();
    

    I know the latter is "standard GCP authentication" knowledge, it just didn't come to my mind immediately.