Search code examples
pythondocxpython-docx

Access Run nested elements - Python Docx


This is the xml extraced from the word document

<w:r>
    <w:rPr>
      <w:rtl/>
    </w:rPr>
    <w:fldChar w:fldCharType="begin">
      <w:ffData>
        <w:name w:val="נפתח1"/>
        <w:enabled/>
        <w:calcOnExit w:val="0"/>
        <w:ddList>
          <w:listEntry w:val="A"/>
          <w:listEntry w:val="B"/>
          <w:listEntry w:val="C"/>
          <w:listEntry w:val="D"/>
          <w:listEntry w:val="E"/>
        </w:ddList>
      </w:ffData>
    </w:fldChar>
</w:r>

I'm trying to get to the <w:ddList> sections, but I cant get past the <w:r> section

I know its possible to get the run element the following way

run = doc.paragraphs[0].runs[0]

but runs doesn't have any list like attribute, so I'm about clueless how to continue, any help will be appreciated.


Solution

  • What you're looking for is run._r, which is the lxml.etree._Element (subtype) object representing the <w:r> element.

    From there you can use XPath expressions to get to what you're after, like:

    r = run._r
    fldChars = r.xpath("./w:fldChar")
    

    The return value of a .xpath() call will be a list of zero or more objects matching the XPath expression.

    python-docx-implemented element objects, so-called oxml objects with type names like CT_Run, have a fancier .xpath() implementation that delivers you from having to fuss over namespaces. The run._r object is one of these. If you invoke for example fldChars[0].xpath(...) (fldChart would be a "plain" lxml.etree._Element object) then you need to provide Clark-name style namespaces for each element, which is ugly at best.

    The lxml documentation has more on that, but if you can "root" your XPath search on a python-docx element like run._r then you can avoid that, using like r.xpath("./w:fldChar/w:ffData") to get at those child elements for example, rather than starting at the w:fldChar element.