Search code examples
machine-learningneural-networkartificial-intelligenceobject-detectionyolo

Unable to understand YOLOv4 architecture


I was going through yolov4 paper where the authors have mentioned Backbone(CSP DARKNET-53), Neck (SPP followed by PANet) & than Head(YOLOv3). Hence is the architecture something like this:

CSP Darknet-53-->SPP-->PANet-->YOLOv3(106 layers of YOLOv3).

Does this mean YOLOv4 incorporates entire YOLOv3?


Solution

  • First, what is YOLOv3 composed of?

    YOLOv3 is composed of two parts:

    1. Backbone or Feature Extractor --> Darknet53
    2. Head or Detection Blocks --> 53 layers

    The head is used for (1) bounding box localization, and (2) identify the class of the object inside the box.

    In the case of YOLOv4, it uses the same "Head" with that of YOLOv3.

    To summarize, YOLOv4 has three main parts:

    1. Backbone --> CSPDarknet53
    2. Neck (Connects the backbone with the head) --> SPP, PAN
    3. Head --> YOLOv3's Head

    References: