python numpy keras artificial-intelligence yolo

Yolo from scratch dataset and output

Hi I coded a YOLO model from scratch and just came to realise that my dataset does not fit the models output. This is what I mean: The model outputs a S x S x (B * 5 + C) matrix. The shape of y[0] (the answer for the first image) is (7,5). How will I make the model use the labels of mine. From what I knew and read the labels come in this format x,y,w,h,objectiveness_score, class_scores for the yolo algorithm so how come that the model will output a 3D matrix while the labels are a 2d matrix.

How will I solve the issue of mine by using numpy and keras?

Solution

According to the paper (section 2), the S x S x (B * 5 + C) shaped output represents the S x S grid cells that YoloV1 splits the image into. The last layer can be implemented as a fully connected layer with an output length S x S x (B * 5 + C), then you can simply reshape the output to a 3D shape.

The paper states that:

"Our system divides the input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object."

Meaning you have to assign each label to its corresponding grid cell in order to do backpropagation. For reference, a keras/tensorflow implementation of the loss calculation can be found here (by the github user FMsunyh).