I am really confused about organizing google Vertex Ai dataset and train the autoML model in GCP. Could any one please help me to understand?
Let me explain scenarios in which I have confusion.
Let’s suppose if I have Text entity extraction dataset in vertex Ai “contract_delivery_02” with 25 files. I have 3 labels created (DelIncoTerms, DelLocation and DelWindow) and I have trained model. This is working great.
Now, I have 10 more files to upload, where I have introduced 2 additional labels (DelPrice & DelDelivery).
My questions
For question #1, you don't have to upload all files again. In your Dataset, you just have to add your 2 new labels and then upload your additional 10 files.
Once uploaded, you may now proceed to put labels on your newly added files (in your example, total of 10 files) and then assign the new labels on ALL files (25 + 10). You can do this by double-clicking the newly added text from the UI and then assign necessary labels.
For question #2, since there are newly added labels and training texts, it is necessary for you to retrain the whole autoML for more accurate Model and better quality of results.
You may refer to this Text Entity Extraction preparation of data and Training Models documentation for more details.