Seeking help regarding image annotation formats for object detection API.
- Foreknow:
As, we know there are two annotation formats for images, Pascal VOC and COCO formats. Both have their own specification here's the main difference between both:
Pascal VOC:
COCO:
- Current-issue:
I have two dataset to deal and this is how they are annotated.
Dataset-1:
Dataset-2:
The thing that I am not able to get pass through is which format(Pascal VOC or COCO) should I follow to convert my annotations into Tfrecords(.xml to .records) as use can see the annotations of dataset aren't purely belong to any of one format.
For instance, in this link the author wrote a script to convert .xml into .records but here it is dealing with pure pascal VOC format.
And in this link they are dealing with pure COCO annotation formats.
Which path should I follow as I am standing in the middle of both formats?
Which path should I follow as I am standing in the middle of both formats?
Use Pascal VOC format for conversion of .xml into .records.
Make the following changes in a create_tf_example
function of this link
for index, row in group.TextLine.iterrows():
xmin.append(row['X']/imgwidth)
xmax.append((row['X']+row['Width'])/imgwidth)
ymin.append(row['Y']/imgheight)
ymax.append((row['Y']+row['Height'])/imgheight)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))'
In case where you have X, Y, Width, Height in your .xml annotations instead of xmin, ymin, xmax, ymax.