Search code examples
pythonxmlminidom

Find element with attribute with minidom


Given

<field name="frame.time_delta_displayed" showname="Time delta from previous displayed frame: 0.000008000 seconds" size="0" pos="0" show="0.000008000"/>
<field name="frame.time_relative" showname="Time since reference or first frame: 0.000008000 seconds" size="0" pos="0" show="0.000008000"/>
<field name="frame.number" showname="Frame Number: 2" size="0" pos="0" show="2"/>
<field name="frame.pkt_len" showname="Packet Length: 1506 bytes" hide="yes" size="0" pos="0" show="1506"/>
<field name="frame.len" showname="Frame Length: 1506 bytes" size="0" pos="0" show="1506"/>
<field name="frame.cap_len" showname="Capture Length: 1506 bytes" size="0" pos="0" show="1506"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="frame.protocols" showname="Protocols in frame: eth:ip:tcp:http:data" size="0" pos="0" show="eth:ip:tcp:http:data"/>

How do I get the field with name="frame.len" right away without iterating through every tag and checking the attributes?


Solution

  • I don't think you can.

    From the parent element, you need to

    for subelement in element.GetElementsByTagName("field"):
        if subelement.hasAttribute("frame.len"):
            do_something()
    

    Reacting to your comment from March 11, if the structure of your documents is stable and free of nasty surprises (like angle brackets inside attributes), you might want to try the unthinkable and use a regular expression. This is not recommended practice but could work and be much easier than actually parsing the file. I admit that I've done that sometimes myself. Haven't gone blind yet.

    So in your case you could (assuming that a <field> tag doesn't span multiple lines):

    xmlfile = open("myfile.xml")
    for line in xmlfile:
        match = re.search(r'<field\s+name="frame.len"\s+([^>]+)/>', line):
        if match:
            result = match.group(1)
            do_something(result)
    

    If a <field> tag can span multiple lines, you could try loading the entire file as plain text into memory and then scan it for matches:

    filedump = open("myfile.xml").read()
    for match in re.finditer(r'<field\s+name="frame.len"\s+([^>]+)/>', filedump):
        result = match.group(1)
        do_something(result)
    

    In both cases, result will contain the attributes other than frame.len. The regex assumes that frame.len is always the first attribute inside the tag.