Search code examples
pythonxmlcomputer-visionyolo

How to automatically get every object of the same type from an XML file?


I am trying to parse an XML file into a TXT file. This is what my XML file looks like:

<annotation>
  <folder>training</folder>
  <filename>106310488.jpg</filename>
  <source>
    <database>synthetic initialization</database>
    <annotation>PASCAL VOC2007</annotation>
    <image>synthetic</image>
    <flickrid>none</flickrid>
  </source>
  <owner>
    <flickrid>none</flickrid>
    <name>none</name>
  </owner>
  <size>
    <width>1024</width>
    <height>681</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>shell</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>234</xmin>
      <ymin>293</ymin>
      <xmax>281</xmax>
      <ymax>340</ymax>
    </bndbox>
 </object>
 <object>
    <name>shell</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>504</xmin>
      <ymin>302</ymin>
      <xmax>551</xmax>
      <ymax>349</ymax>
    </bndbox>
  </object>
  <object>
    <name>shell</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>776</xmin>
      <ymin>302</ymin>
      <xmax>823</xmax>
      <ymax>349</ymax>
    </bndbox>
  </object>
</annotation>

The information that I am interested in are within <object>. I want to get the <name> and everything inside <bndbox>. These are the names and bounding box coordinates of objects in a dataset. I don't know <object> entries there are with <bndbox> in each XML file so I want to write a logic that gets all of them.

So far, what my logic does is to get and process only the 1st occurrence of <object><bndbox></bndbox></object>. If there are any other bounding box coordinates inside the XML file, my code simply skips it. I don't want this. Here is my code:

for annotations_file in annotations_dir:

  annotations = []
  milliseconds = int(time() * 1000)

  doc = ET.parse('/content/darknet/logorec/openlogo/Annotations/' + annotations_file) # Parsing the XML file
  
  new_annotations_file_name = annotations_file.split('.')[0] # Getting the name of the XML file without the file extension
  
  canvas = cv2.imread('/content/darknet/logorec/openlogo/JPEGImages/' + new_annotations_file_name + '.jpg') # Get the entire image
  
  canvas_shape = canvas.shape # Get the dimensions of the image
  
  root = doc.getroot() # Gets the root of the XML file
  
  annotations_box = root[6][4] # Gets the bounding box coordinates from the XML file

  class_name = root[6][0] # Name of the object within the bounding box
  class_name = class_name.text # Getting the text value

  for ant in annotations_box:
    annotations.append(ant.text) # Appending every sindle bounding box coordinate to an empty list
  
  ''' These are my annotations calculations for the YOLO model'''
  logo_shape_w = int(annotations[2]) - int(annotations[0])
  logo_shape_h = int(annotations[3]) - int(annotations[1])

  x1 = int(annotations[0]) # x1 = xmin
  y1 = int(annotations[3]) # y1 = ymax

  x2 = x1 + logo_shape_w
  y2 = y1 + logo_shape_h

  w = x2 - x1
  h = y2 - y1

  center_x = x1 + (w/2)
  center_y = y1 + (h/2)

  x = center_x / canvas_shape[0]
  y = center_y / canvas_shape[1]

  width = w / canvas_shape[0]
  height = h / canvas_shape[1]
  '''---------------------------------------------------------'''

Solution

  • Parsing the XML with xpath it could be possible to iterate over objList items. Only the first item shown

    >>> from lxml import etree                       
    >>> tree = etree.parse('test.xml')                                                                                                              
    >>> objList = tree.xpath('//object')
    >>> bnd = objList[0].xpath('name | bndbox/*')
    >>> for e in bnd:
    ...     e.text
    ... 
    'shell'
    '234'
    '293'
    '281'
    '340'
    

    Iterating all objects

    >>> for obj in objList:
    ...      bnd = obj.xpath('name | bndbox/*')
    ...      for e in bnd:
    ...          e.text
    ... 
    'shell'
    '234'
    '293'
    '281'
    '340'
    'shell'
    '504'
    '302'
    '551'
    '349'
    'shell'
    '776'
    '302'
    '823'