Search code examples
pythonyoloyolov8roboflowultralytics

Update Yolo V8 training with another dataset


I trained a YOLO V8 model using a dataset downloaded from here, and I used Ultralytics and Roboflow library. I used the following command (using the pre-trained model yolov8n.pt downloadable here):

yolo task=detect mode=train model=C:\Training\yolov8n.pt data=C:\DATASET\DIRECTORY\data.yaml epochs=20 imgsz=640

this command created a file named best.pt in:

C:\Users\USERNAME\runs\detect\train\weights

which I was able to use for detecting objects. However I'm not completely satisfied with the results, so I'd like to update the training (keeping the previous training) with another dataset. Is there a specific command for doing so?

Thank you!


Solution

  • The easiest way here is to start the training on a new dataset from your best.pt checkpoint:

    yolo task=detect mode=train model=C:\Users\USERNAME\runs\detect\train\weights\best.pt data=C:\DATASET\DIRECTORY\data.yaml epochs=20 imgsz=640
    

    There are some nuances.

    • You can't actually add up train results from different trainings to each other. In this case, you only start the training from the already good point, but they can be rewritten during the current training, especially if the new dataset has a different class list or the images in it are severely different from the first dataset. So-called "catastrophic forgetting".
    • Consider lowering the learning rate for the second training process to save more information from the first training. Also, you may want to freeze the first n layers of the trained model and train the remaining part of it. The available training arguments are here: https://docs.ultralytics.com/modes/train/#arguments.

    In case you need the model to equally consider the information from the different datasets, it is better to unite them into one and train the model on this united data.