I am updating a model that is previously running on gcp ai-platform
to vertex ai
[1, 2].
The settings that I am looking for are as below.
Vertex AI custom job
with pre-built containers
(using gcloud CLI)
python-module
which contains the code of the training phase of our modelCan someone help me if there is something wrong with the below sequence of the task?
It does not seem the python module is the cause of the problem since it is the same code that is currently running well with ai-platform
.
# simplified python module structure
# ./vertex-ai-poc
# ├── __init__.py
# ├── trainer
# │ ├── __init__.py
# │ └── task.py
# └── setup.py
python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar
# -> dist generated
gsutil cp dist/trainer-0.2.tar.gz gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz
# -> uploaded correctly
gcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz' \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task
However, I am encountering the below errors.
file:///user_dir/trainer-0.2.tar.gz does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
c.f. I am noticing file:///
with 3 slashes. And belive there is something to do with docker. [3]
References
I end up fixing the problem. I'll share the situation for those of you with a similar error. The problem was that I wasn't using find_packages()
correctly.
First, there are three possible ways of submitting custom vertex-ai jobs.
local-package-path
param--python-package-uris
flag(I believe) Method 1, 2, and 3.1 build docker images in the local machine and submit the built image to vertex-ai
. Method 3.2 simply uses a pre-built container and combines python packages at executor-image-uri
in vertex-ai
.
** The problem was that when I run the below command to generate the dist package, I ran it from ../..
with ./[PATH]/.
and ended up not correctly getting the find_packages()
values which lead to both 3.1 and 3.2 methods not correctly running.
# Error: python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar`
python3 ./setup.py sdist --formats=gztar`
from setuptools import find_packages, setup
setup(
name='trainer',
version='0.1',
packages=find_packages(), # <-- HERE
include_package_data=True,
)
The fixed version of local-package and external uris end up making the below script work.
local-package-path
paramgcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',script=task.py,local-package-path=vertex-ai-poc/trainer
--python-package-uris
flaggcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.1.tar.gz' \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task