This is the xml extraced from the word document
<w:r>
<w:rPr>
<w:rtl/>
</w:rPr>
<w:fldChar w:fldCharType="begin">
<w:ffData>
<w:name w:val="נפתח1"/>
<w:enabled/>
<w:calcOnExit w:val="0"/>
<w:ddList>
<w:listEntry w:val="A"/>
<w:listEntry w:val="B"/>
<w:listEntry w:val="C"/>
<w:listEntry w:val="D"/>
<w:listEntry w:val="E"/>
</w:ddList>
</w:ffData>
</w:fldChar>
</w:r>
I'm trying to get to the <w:ddList>
sections, but I cant get past the <w:r>
section
I know its possible to get the run element the following way
run = doc.paragraphs[0].runs[0]
but runs doesn't have any list like attribute, so I'm about clueless how to continue, any help will be appreciated.
What you're looking for is run._r
, which is the lxml.etree._Element
(subtype) object representing the <w:r>
element.
From there you can use XPath expressions to get to what you're after, like:
r = run._r
fldChars = r.xpath("./w:fldChar")
The return value of a .xpath()
call will be a list of zero or more objects matching the XPath expression.
python-docx
-implemented element objects, so-called oxml
objects with type names like CT_Run
, have a fancier .xpath()
implementation that delivers you from having to fuss over namespaces. The run._r
object is one of these. If you invoke for example fldChars[0].xpath(...)
(fldChart
would be a "plain" lxml.etree._Element
object) then you need to provide Clark-name style namespaces for each element, which is ugly at best.
The lxml
documentation has more on that, but if you can "root" your XPath search on a python-docx
element like run._r
then you can avoid that, using like r.xpath("./w:fldChar/w:ffData")
to get at those child elements for example, rather than starting at the w:fldChar
element.