Hi I coded a YOLO model from scratch and just came to realise that my dataset does not fit the models output. This is what I mean:
The model outputs a S x S x (B * 5 + C)
matrix.
The shape of y[0] (the answer for the first image) is (7,5)
.
How will I make the model use the labels of mine.
From what I knew and read the labels come in this format x,y,w,h,objectiveness_score, class_scores
for the yolo algorithm so how come that the model will output a 3D matrix while the labels are a 2d matrix.
How will I solve the issue of mine by using numpy and keras?
According to the paper (section 2), the S x S x (B * 5 + C)
shaped output represents the S x S
grid cells that YoloV1 splits the image into. The last layer can be implemented as a fully connected layer with an output length S x S x (B * 5 + C)
, then you can simply reshape the output to a 3D shape.
The paper states that:
"Our system divides the input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object."
Meaning you have to assign each label to its corresponding grid cell in order to do backpropagation. For reference, a keras/tensorflow implementation of the loss calculation can be found here (by the github user FMsunyh).