Search code examples

How to remove sectional breaks from word document using python-docx

I am trying to remove the sectional breaks from a word document. For this I am trying to remove the sectPr attribute from the xml generated through python-docx. This is the xml which is generated :

<w:document xmlns:wpc="" xmlns:cx="" xmlns:cx1="" xmlns:cx2="" xmlns:cx3="" xmlns:cx4="" xmlns:cx5="" xmlns:cx6="" xmlns:cx7="" xmlns:cx8="" xmlns:mc="" xmlns:aink="" xmlns:am3d="" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="" xmlns:m="" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="" xmlns:wp="" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="" xmlns:w14="" xmlns:w15="" xmlns:w16cex="" xmlns:w16cid="" xmlns:w16="" xmlns:w16se="" xmlns:wpg="" xmlns:wpi="" xmlns:wne="" xmlns:wps="" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex wp14">
    <w:p w14:paraId="0F1E22A8" w14:textId="1CB95B52" w:rsidR="006F7C29" w:rsidRDefault="00B46A6B">
        <w:sectPr w:rsidR="006F7C29">
          <w:pgSz w:w="11906" w:h="16838"/>
          <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
          <w:cols w:space="708"/>
          <w:docGrid w:linePitch="360"/>
    <w:p w14:paraId="3FE55637" w14:textId="789D24FC" w:rsidR="003660CC" w:rsidRPr="003660CC" w:rsidRDefault="003660CC" w:rsidP="008F17C5"/>
    <w:sectPr w:rsidR="003660CC" w:rsidRPr="003660CC" w:rsidSect="008F17C5">
      <w:type w:val="evenPage"/>
      <w:pgSz w:w="11906" w:h="16838"/>
      <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
      <w:cols w:space="708"/>
      <w:docGrid w:linePitch="360"/>

I have written the following code to remove sectPr:

def identifySbr(doc):
    document_xml = doc.element.xml

    for i in range(0,allp):
        c = doc.paragraphs[i]._p.xpath("./w:pPr/w:sectPr")

        if len(c)>0:
            ca = doc.paragraphs[i]._p.xpath("./w:pPr/w:sectPr")[0]

But I am getting this error:

  File "src\lxml\etree.pyx", line 2449, in lxml.etree._Attrib.pop
KeyError: '{}sectPr'

can anybody please help me resolve this?


  • The <w:sectPr> item you are trying to remove is an element, not an attribute (of an element). So the error message is telling you that the w:sectPr element has no w:sectPr attribute, which of course it doesn't.

    I think what you're looking for is something like this:

    def remove_all_but_last_section(document):
        for paragraph in document.paragraphs:
            p = paragraph._p
            sectPrs = p.xpath("./w:pPr/w:sectPr")
            if not sectPrs:
            sectPr = sectPrs[0]

    An alternative implementation which is perhaps a bit more elegant and definitely would perform better (although it would probably be very fast either way unless the document was huge):

    def remove_all_but_last_section(document):
        sectPrs = document._element.xpath(".//w:pPr/w:sectPr")
        for sectPr in sectPrs: