Here's a tutorial about doing custom training of YOLO (Darknet):
The tutorial guides how to set values in the .cfg
Why is it 'plus 5' then 'times 3'?
Some say it's (classes + coords + 1) * num, but I can't guess it out the meaning.
I've found the answer,
filters = (classes + 5) * 3
= (classes + width + height + x + y + confidence) * num
= (classes + 1+1+1+1+1) * num
= (classes + 5) * num
YOLOv3 dectects 3 boxes per grid cell, so it is:
filters = (classes + 5) * 3