Search code examples
python-3.xfindallxml.etreedraw.io

How to etree findall() on many levels


I try to manipulate an diagrams.net (formerly draw.io) not compressed XML exported drawing.

Cables can be hooked up to elements and I want to get a cables-list. I do a search for all cables by testing if the element has source and target attributes. Then I compare the id's of both with the full list of elements to find the connected label in value.

That works great until someone tries to add an "addon-tag". After that (even if it's deleted), the element gets wrapped in a <object> that has the id attribute but the source and target attribute stay in a child node called like this:

before:

<mxCell id="ferXMembXyNwfAPwV5vA-22" value="" style="..endless list" edge="1" parent="1" source="ferXMembXyNwfAPwV5vA-8" target="ferXMembXyNwfAPwV5vA-18">
  <mxGeometry relative="1" as="geometry">
    <mxPoint x="540" y="520" as="sourcePoint" />
    <mxPoint x="700" y="520" as="targetPoint" />
  </mxGeometry>
</mxCell>

after:

<object label="" id="ferXMembXyNwfAPwV5vA-53">
  <mxCell style="..endless long list" edge="1" parent="1" source="ferXMembXyNwfAPwV5vA-42" target="ferXMembXyNwfAPwV5vA-51">
    <mxGeometry relative="1" as="geometry">
      <mxPoint x="660" y="340" as="sourcePoint" />
      <mxPoint x="770" y="360" as="targetPoint" />
    </mxGeometry>
  </mxCell>
</object>

this findall works for normal mxCell formated to find id, source and target elements:

list_of_mxCell_elements = root.findall(root_node,".//*[@source][@target]")

and this for objects elements id's:

list_of_objects_elements = root.findall(root_node,".//*[@source][@target]/..")

But how can I access the mxCell element from the list_of_objects_elements, so I can get hold of source and target id's?


Solution

  • I found a solution by my own.

    After findall 'elements' i iterate over the list of elements and just do another findall on each cable element i got.

    It looks somewhat like this:

    list_of_objects_elements = root.findall(root_node,'.//*[@source][@target]/..')
    for cable in list_of_objects_elements:
        for mxCell in cable.findall('./*[@source][@target]'):
    

    Note the slighly different findall path:

    This makes an search for source & target in <mxCells> while passing back the next higher element <object> from root.

    .//*[@source][@target]/..

    ./*[@source][@target]

    While the lower searches for source & target in only one element deeper than <object>

    To me those paths are still a Mindblow.