Search code examples
deep-learningobject-detectionyolopre-trained-model

What are backend weights in deep learning models (yolo)?


pretty new to deep learning, but couldn't seem to find/figure out what are backend weights such as

full_yolo_backend.h5
squeezenet_backend.h5

From what I have found and experimented, these backend weights have fundamentally different model architectures such as

  • yolov2 model has 40+ layers but the backend only 20+ layers (?)
  • you can build on top of the backend model with your own networks (?)
  • using backend models tend to yield poorer results (?)

I was hoping to seek some explanation on backend weights vs actual models for learning purposes. Thank you so much!


Solution

  • I'm note sure which implementation you are using but in many applications, you can consider a deep model as a feature extractor whose output is more or less task-agnostic, followed by a number of task-specific heads.

    The choice of backend depends on your specific constraints in terms of tradeoff between accuracy and computational complexity. Examples of classical but time-consuming choices for backends are resnet-101, resnet-50 or VGG that can be coupled with FPN (feature pyramid networks) to yield multiscale features. However, if speed is your main concern then you can use smaller backends such as different MobileNet architectures or even the vanilla networks such as the ones used in the original Yolov1/v2 papers (tinyYolo is an extreme case).

    Once you have chosen your backend (you can use a pretrained one), you can load its weights (that is what your *h5 files are). On top of that, you will add a small head that will carry the tasks that you need: this can be classification, bbox regression, or like in MaskRCNN forground/background segmentation. For Yolov2, you can just add very few, for example 3 convolutional layers (with non-linearities of course) that will output a tensor of size

    BxC1xC2xAxP
    #B==batch size
    #C1==number vertical of cells
    #C2==number of horizontal cells
    #C3==number of anchors
    #C4==number of parameters (i.e. bbx parameters, class prediction, confidence)  
    

    Then, you can just save/load the weights of this head separately. When you are happy with your results though, training jointly (end-to-end) will usually give you a small boost in accuracy.

    Finally, to come back to your last questions, I assume that you are getting poor results with the backends because you are only loading backend weights but not the weights of the heads. Another possibility is that you are using a head trained with a backends X but that you are switching the backend to Y. In that case since the head expects different features, it's natural to see a drop in performance.