Search code examples
google-cloud-platformgoogle-cloud-vertex-ai

Vertex AI updating dataset and train model


I am really confused about organizing google Vertex Ai dataset and train the autoML model in GCP. Could any one please help me to understand?

Let me explain scenarios in which I have confusion.

Let’s suppose if I have Text entity extraction dataset in vertex Ai “contract_delivery_02” with 25 files. I have 3 labels created (DelIncoTerms, DelLocation and DelWindow) and I have trained model. This is working great.

enter image description here

Now, I have 10 more files to upload, where I have introduced 2 additional labels (DelPrice & DelDelivery).

My questions

  1. Do I require to do upload all the files (25 + 10) again ?
  2. Do I require to retrain my whole autoML model again ? or is there any other approach for this scenario?

Solution

  • For question #1, you don't have to upload all files again. In your Dataset, you just have to add your 2 new labels and then upload your additional 10 files. enter image description here

    Once uploaded, you may now proceed to put labels on your newly added files (in your example, total of 10 files) and then assign the new labels on ALL files (25 + 10). You can do this by double-clicking the newly added text from the UI and then assign necessary labels. enter image description here

    For question #2, since there are newly added labels and training texts, it is necessary for you to retrain the whole autoML for more accurate Model and better quality of results.

    You may refer to this Text Entity Extraction preparation of data and Training Models documentation for more details.