Search code examples
algorithmimagedeep-learningyolov5

Yolov5: image detection without segmentation?


I have read a number of papers on Yolov5 images detection techniques. But the papers don't refers to any segmentation step done by Yolov5. While I know that it is not possible to do image classification without a segmentation process, I am asking the following question: do Yolov5 do any segmentation step in order to detect images? If yes which segmentation algorithm does it use?


Solution

  • segmentation mainly uses Fully Convolutional Network(FCN) architecture. FCN is a CNN without fully connected layers(FC). segmenation can be thought as an encoder followed by a decoder. Here encoder and decoder is FCN.

    classification using CNN is a set of convolutional layers(extract high level features of input image) followed by one or more fully connected(FC) layers or dense layers.Last dense/FC layer classify the input image into various classes.

    YOLO is a regression based object detection algorithm based on CNN architecture.In YOLO image is split or segmented into S * S grid cells.Each grid cell predict only one object that means a cell tries to predict an object whose centre falls inside that cell. For each grid cell CNN predicts

    • B number of bounding boxes(x,y,w,h).(x,y) is the centre of a bounding box relative to cell location.Confidence score of each predicted bounding box is also calculated.Confidence score of each bounding box is the IOU of predicted bounding box and ground truth bounding box.Confidence score represent how likely the bounding box contains an object
    • C conditional class probabilities for each grid cell(one per class). Conditional class probability means probability of detected objects belongs to a class.

    Shape of prediction/output of CNN will be (S , S, (B * 5 + C)) ; number 5 represent x_center,y_center,width,height of bounding box and its confidence score

    If an image is divided into 7 * 7 grid cells , 2 bounding boxes are predicted for each cell and the total number of classes are 3, then shape of CNN output will be (7,7,13)

    Source