I am trying to remove the last page from the word document, but haven't found any solution yet. More precisely I want to remove a section from a document.
document.sections[-1]
Can be used to access the last section, but how can I remove it.
It turns out that the unfortunately short answer seems to be: you can't do this with python-docx, at least not with their API. If you dug down into the guts you could probably hack something out that would work for your specific case. But in the last 10-15 minutes of research I did this doesn't appear to be possible.
Here's a few issues:
Though from the posts in (2) it seems there might be an alternative package that could help (https://pypi.org/project/docxcompose/).
Edit: This is as far as I got. It's quite kludgy but worked with a very quick basic test, though I think it's partially broken. And it left a blank page at the end. This definitely doesn't solve the question, but maybe could be a starting point to dig more.
import docx
d = docx.Document('test.docx')
new_doc = docx.Document()
def get_last_page_break(document):
paragraph_index = 0
for paragraph in document.paragraphs:
paragraph_index += 1
run_index = 0
for run in paragraph.runs:
run_index += 1
if 'lastRenderedPageBreak' in run._element.xml: # soft page break
lastpara_index = paragraph_index
lastrun_index = run_index
elif 'w:br' in run._element.xml and 'type="page"' in run._element.xml: # hard page break
lastpara_index = paragraph_index
lastrun_index = run_index
return lastpara_index, lastrun_index
def kludgy_remove_last_page(document):
new_doc = docx.Document()
last_para, lastrun_index = get_last_page_break(d)
for para in d.paragraphs[:last_para]:
new_para = new_doc.add_paragraph()
for run in para.runs[:lastrun_index]:
new_para.add_run(run.text)
if 'w:br' in run._element.xml and 'type="page"' in run._element.xml: # hard page break
new_doc.add_page_break()
return new_doc
new_doc = kludgy_remove_last_page(d)
new_doc.save('removed.docx')