I'm trying to implement code to start a custom job in Vertex.
I have no problem starting a custom job using gcloud
:
gcloud ai custom-jobs --project my_project_id create --region=europe-west1 --display-name="train model based on custom container" --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest
I've not been able to find official code sample for .NET but tried to mimick someone else doing it in Python plus ChatGPT produced a similar code sample:
var projectId = "my_project_id";
var locationId = "europe-west1";
var client = await JobServiceClient.CreateAsync();
var createCustomJobRequest = new CreateCustomJobRequest
{
ParentAsLocationName = new LocationName(projectId, locationId),
CustomJob = new CustomJob
{
DisplayName = "train model based on custom container",
JobSpec = new CustomJobSpec()
{
WorkerPoolSpecs =
{
new WorkerPoolSpec
{
MachineSpec = new MachineSpec
{
MachineType = "n1-standard-4"
},
ReplicaCount = 1,
ContainerSpec = new ContainerSpec()
{
ImageUri = "europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest"
}
}
}
}
}
};
var result3 = await client.CreateCustomJobAsync(createCustomJobRequest); // exception thrown here
Unfortunately, I get an exception back:
Grpc.Core.RpcException: 'Status(StatusCode="Unimplemented", Detail="Bad gRPC response. HTTP status code: 404")'
Things I've tried and failed
CreateCustomJobAsync()
that takes a CustomJob
and a Parent
instead of a CreateCustomJobRequest
object.JobServiceClientBuilder
instead of JobServiceClient.CreateAsync()
and set the Endpoint
argument as europe-west1-aiplatform.googleapis.com
.What am I missing to get a custom job started in Vertex AI?
I should have digged a bit more around JobServiceClientBuilder
. Specifically, when using the builder's client
object to start a job I actually got a different message back:
Grpc.Core.RpcException
HResult=0x80131500
Message=Status(StatusCode="PermissionDenied", Detail="Permission 'aiplatform.customJobs.create' denied on resource '//aiplatform.googleapis.com/projects/my_project_id/locations/europe-west1' (or it may not exist).")
Source=Google.Api.Gax.Grpc
While the message was somewhat clear I wasn't sure if it was the right error message, like how Unimplemented
didn't make sense so I dismissed this one too.
Anyway, since writing the question I thought that gcloud
and SDK authentication may be different. It turns out that the active user in the command line (the *
next to the user in gcloud auth list
) is my own credential while the environment variable GOOGLE_APPLICATION_CREDENTIAL
is referencing a service account. Once I added the role Vertex AI Administrator
to the service account I was finally able to start a job.
So, use JobServiceClient.CreateAsync()
if the sa behind GOOGLE_APPLICATION_CREDENTIAL
has the right permission. If you need to use another sa then instantiate a JobServiceClient
like so:
var client = await new JobServiceClientBuilder
{
Endpoint = "europe-west1-aiplatform.googleapis.com",
GoogleCredential = GoogleCredential.FromFile(@"your-service-account.json")
}.BuildAsync();
I know the latter is "standard GCP authentication" knowledge, it just didn't come to my mind immediately.